Open rwforest opened 7 months ago
@rwforest The request is not clear, could you elaborate?
@nfx When doing table replacement, source to target table is only the first pass. How do we guarantee that anything will run successfully? That's the goal of the migration. So the goal here is, for example, given 1000 notebooks, I need to be able to understand which chain of commands are the most critical. In terms of dependencyGraph we can check the incoming and outgoing edges. If we know a notebook is an orphan, then we don't attempt to fix it. Right now even if you fix all the cells, it is only the beginning of all the errors that will surface eventually.
@rwforest : Could you clarify the following:
Generally speaking, I would like capability to plan code migrations using UCX based on a dependency graph of the (to-be-migrated) tables linted from the code
@JCZuurmond there's a feature in Databricks that never made it to public I think it was to build run time dataset dependency graph, and I was told by @FastLee that it is dependent on the DBR version. I believe the linted code is static lineage but I got some notebooks that are heavily parameterized.
And I agree for the planning part. Is there a roadmap on some planning UI? I can't imagine how someone would plan code migration using csv.
A planning UI is not on the roadmap. We are considering to include this issue in our upcoming planning, but no decision yet
Is there an existing issue for this?
Problem statement
Table mapping does not solve everything, chances are there's still error after migration. Since HMS lineage is there, UCX should target merging with dependencyGraph
Proposed Solution
Merge HMS lineage with dependencyGraph. While it is dependent on the version of the DBR, it should start with the highest runtime and then backfill anything that's not captured using other means. Static lineage parsing or Spark listener.
scope:
Optionally, we can create multiple copies of the same graph with starting points from a single table to show full migration scope.
Outcome:
Asset (link)
,Asset type
,Owner
,Failures
columns, sorted by their required order. Filter on owner.Additional Context
No response