Infer values from child notebook in magic line

ericvergnaud commented 1 week ago

Changes

When linting notebooks, use values from child notebooks loaded via %run magic line to improve value inference

Linked issues

Resolves #1201 Progresses #1901

Functionality

None

Tests

[x] added unit tests

github-actions[bot] commented 6 days ago

✅ 26/26 passed, 1 skipped, 10m54s total

_{Running from acceptance #4641}

ericvergnaud commented 5 days ago

Looking at the bigger picture here, I have an observation and some questions.

First the observation: at the moment we seem to treat %-magic differently if it's a complete cell versus a command within a cell. I understand that this is mainly historic but think it's starting to cause issues. The situation is really:

%-magic is (by default) a command within a cell…

…except for on the first line of a cell, where instead of being a command it might be (and normally is) a marker for the cell language to override the notebook default.

The bigger questions I have are more specific to this PR:

We have tests covering values flowing out of a child notebook, but what about values flowing in? Values defined before the %run are visible within the child file.

The focus of this PR is inference, but what happens if there are linting findings within a notebook that we %run? Are these reported? (I think so?) Should they be reported by the workflow when the child notebook is probably also scanned independently?My feeling on this is that reporting the findings is fine: they may be different due to values flowing in. However when we're scanning lots of notebooks we might need to deduplicate because now the same file is covered multiple times due to each point of inclusion. (Is this deduplication already present?)

Re magic locations I'm not sure we are far from where we should be. If a cell starts with a magic line then we treat it as a cell language. Re bigger questions, I think they deserve dedicated PRs, see #2155 and #2156. We might be missing a ticket for values flowing in via parameters. We're missing tests for values flowing in via cells, but it should already work.

asnare commented 5 days ago

[…] Re bigger questions, I think they deserve dedicated PRs, see #2155 and #2156. We might be missing a ticket for values flowing in via parameters. We're missing tests for values flowing in via cells, but it should already work.

I also think that values flowing into the child notebook should work, but would like to see the tests covering it as part of this PR if it works. Just so we know. :) (If it doesn't work, I'm fine with a new PR to deal with it.)

Those PRs are good for covering the linting of child code, although there's still the higher-level issue about how to deal with the noise of duplicate findings that it will introduce at the workflow level.

ericvergnaud commented 5 days ago

[…] Re bigger questions, I think they deserve dedicated PRs, see #2155 and #2156. We might be missing a ticket for values flowing in via parameters. We're missing tests for values flowing in via cells, but it should already work.

I also think that values flowing into the child notebook should work, but would like to see the tests covering it as part of this PR if it works. Just so we know. :) (If it doesn't work, I'm fine with a new PR to deal with it.)

Those PRs are good for covering the linting of child code, although there's still the higher-level issue about how to deal with the noise of duplicate findings that it will introduce at the workflow level.

Actually it's not testable easily. At this point we're only linting parent notebooks using child notebooks as context, but not child notebooks with parent notebooks as context. So may I suggest we deal with this downwards inference topic in a separate PR - writing the test involves more work than it seems at first glance.

The PRs for #2155 and #2156 will need to cover this edge scenario

databrickslabs / ucx