databricks-demos / dbdemos

Demos to implement your Databricks Lakehouse
Other
255 stars 80 forks source link

uc04-system-tables tries running against the wrong catalog. #91

Closed nova-jj closed 6 months ago

nova-jj commented 6 months ago

image

Install worked fine as a user against a cluster in Shared Mode.

First, Running the 02 notebook mashed our target schema permissions, setting ownership to account users.

I reset that and made sure the service account meant to run the daily forecaster had the necessary permissions, but then last night it ran against the wrong catalog.

I've had nothing but difficulty moving this demo into a production-ready state and it appears buggy. Your mileage may vary but these built-in-assumptions of Main Catalog are problematic to operationalize these demos.

Might I suggest making catalog and schema inputs required inputs, and have processes fail when they can't touch them for whatever reason, instead of scaffolding new objects in the wrong catalog/database?

Ultimately, it's unclear to me why this is targeting Main and not the catalog I'm specifying. I've even changed the values in the _resources/00-setup notebook, no avail: image

nova-jj commented 6 months ago

Moving the 02-forecaster notebook to a Job Cluster appears to have been the issue.

I noticed the following on my scheduled job-cluster runs: image

Why these parameters aren't respecting the values inside the notebook isn't clear to me, but after modifying the Workflow Job to set the Job Parameters to override what's in the notebooks, I think I'm back on track.

This is definitely not-behaving as advertised though, and I would encourage review of the current approach to installation to safeguard some things.

Specifically:

nova-jj commented 6 months ago

TLDR: Resolution: Hard Coding Job Parameter values in the Job Cluster was required to overcome this issue.

QuentinAmbard commented 6 months ago

hey sorry about that, it wasn't intended to work like that, account users should apply only for dbdemos catalog. I'm releasing a fix so that it doesn't happen again for this demo

nova-jj commented 6 months ago

Curious why permissions are touched at all. If pointing at a pre-existing schema, why would ownership need to change?

Unity Catalog presents a unique challenge for data governance and lacks the ability to provide "deny" rules, adding complexity to traditional DBA operations; it would behoove dbdemos not to touch permissions EVER, and force users to work through adequate access as required, but that's my two cents.

QuentinAmbard commented 6 months ago

it's something we designed originally for internal usage, we're all super admin in some internal env, so we forced it to account users so that everybody can use it. Not something you want to do in a real deployment for sure.

nova-jj commented 6 months ago

Yeah, so complicating that was me installing as super admin into prod ... I've yet to find a clean way to "dogfood" our changes, but elevating to admin only when required is on our roadmap.

I was unable to promote this demo through our standard dev/stg/prod promotion process either, due to how the Databricks Git Integration doesn't allow checking in of the SQL Queries and Dashboard it creates, leading to installing live in prod. "We'll do it live" ended with mashed permissions and cross-catalog writes due to unknown expectations of the demo.

I get it, they're demos, but when our Account Reps are recommending them for visibility, these landmines make for longer-than-anticipated implementations and troubleshooting.

All the best in the end - we're up and running and there's definitely value in this solution; just, some hiccups. ;)