Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
183 stars 36 forks source link

[FEATURE] Shared mode/User Isolation option for job clusters #96

Closed aseemanand closed 5 months ago

aseemanand commented 6 months ago

There is currently a permission issue for shared mode/user isolation for job clusters while using Brickflow. Per Databricks, local_disk0/.ephemeral_nfs is a directory used internally by the system, and users typically don't have direct access to it in shared cluster mode.

Important note:

Spark cannot directly interact with workspace files on compute configured with shared access mode, the purpose of this cluster is being secured, the action the import is trying to do is not allowed.

Reference: https://docs.databricks.com/en/files/workspace-interact.html#read-data-workspace-files

Cloud Information

Describe the solution you'd like Job cluster successfully working with Shared mode/User isolation mode in Brickflow.

Describe alternatives you've considered We gave SELECT on all referenced tables and views in Single User mode to avoid Databricks limitation. This incurs tech debt as it means any new table/view in the upstream source can break our data pipeline .

Additional context None

stikkireddy commented 6 months ago

Related to https://github.com/Nike-Inc/brickflow/issues/31. Will take a look at this.

aseemanand commented 5 months ago

This is resolved. Thanks @stikkireddy @asingamaneni

https://github.com/Nike-Inc/brickflow/pull/97#issue-2130487245