Azure-Samples / modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
MIT License
590 stars 462 forks source link

Create a proposal for workspace organization and common utilities #846

Closed sreedhar-guda closed 2 days ago

sreedhar-guda commented 6 days ago

What: Create a shared workspace that hosts common utilities, configurations, and data lakehouse, and serves as a source for all domain/product-specific workspaces. The default lakehouse associated with all notebooks will be the lakehouse from this workspace.   Why?

Benefits:

Can we do it in this sprint?

How: Please refer to the proposed generic setup diagram below. Image

Applied to current Parking sensors sample - we have 3 options: Image

Is this a theory or has it been tested?

DoD:

sreedhar-guda commented 2 days ago

Feedback from Lace:

My initial view is that it fits better in a Data Mesh sample. Enterprise Workspace organization isn't really in scope of the Fabric E2E sample where the focus is really CICD, testing, etc.. Having multiple, say 3 (two data products, one common), workspaces per Environment (Dev, Test, Prod) would mean the user is deploying 3x3 (9) workspaces in the sample which I think would just overcomplicate things in an already complicated sample... Not to mention the potential ephemeral workspaces we would spin up as part of the CI / PR validation process + branched out workspaces as part of dev. Its just a lot of complexity for the users of the sample....

That said, i think this would fit very well in a Data Mesh reference sample that does not have CICD demonstrated as part of it. Workspace organization also has a ton more considerations as well like best capacity allocation (since workspaces are the smallest unit a capacity can be tied to, etc) -- its a bit of slippery slop if we start adding it in scope to the Fabric E2E sample.

Decision documented in #840 - to use a single Lakehouse (Option 3).