databricks-industry-solutions / security-analysis-tool

Security Analysis Tool (SAT) analyzes customer's Databricks account and workspace security configurations and provides recommendations that help them follow Databrick's security best practices. When a customer runs SAT, it will compare their workspace configurations against a set of security best practices and delivers a report.
Other
74 stars 36 forks source link

Check that DBFS doesn't contain anything which is non-standard #96

Open TheRealJimShady opened 2 months ago

TheRealJimShady commented 2 months ago

Unless I'm mistaken, it would be possible for a privileged user to expand the accessibility of a table managed under UC by writing it to an arbitrary location in DBFS. This could be achieved in the following way; spark.table("mycatalog.myschema.mydata").write.format("delta").save("dbfs:/users/jonsmith/mydata") Would it be possible to expand on the DBFS checks to include a comparison between the structure as it is discovered with;

ramdaskmdb commented 2 months ago

We could scan the entire dbfs:/ folder structure and store the number of objects detected per run. This would work well for a cleanish dbfs folder structure. However, if dbfs is used extensively during experimentation for storing objects, checkpoints, uploaded files, hive metastore managed tables, etc, the scanner could take a very long time depending on the number of objects. We could probably limit the scan to n objects. if > n objects then cap it and dont scan further. Increase in number of objects between runs could be flagged.

TheRealJimShady commented 2 months ago

Thanks for your response, I can see two approaches to implementing this

  1. An exhaustive traversal of DBFS Root storage to build a complete map of the discovered structure which can be compared with that of the approved DBFS structure.
  2. A fail-fast approach which recursively takes each path in DBFS and looks for an equivalent in the approved structure, if it doesn't exist the check exits and reports that there are extraneous paths in DBFS.

It would be possible to implement these as different modes which the user could choose from. Thoughts?

ramdaskmdb commented 2 months ago

By default, dbfs:/ on a new workspace will only have the dbfs:/tmp folder. As long as the dbfs is relatively clean and small the traversal may be fast. Even if state is maintained between runs, if the tree is large it may take too long to run this each time. Let me run a few tests to see how long it may take.

___ Ramdas Murali Solutions Architect - 214.235.8353 | @.***

On Fri, Apr 19, 2024 at 2:25 AM Jim Smith @.***> wrote:

Thanks for your response, I can see two approaches to implementing this

  1. An exhaustive traversal of DBFS Root storage to build a complete map of the discovered structure which can be compared with that of the approved DBFS structure.
  2. A fail-fast approach which recursively takes each path in DBFS and looks for an equivalent in the approved structure, if it doesn't exist the check exits and reports that there are extraneous paths in DBFS.

It would be possible to implement these as different modes which the user could choose from. Thoughts?

— Reply to this email directly, view it on GitHub https://github.com/databricks-industry-solutions/security-analysis-tool/issues/96#issuecomment-2065932026, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANXPYAPNNFYG7G7BTNKEUE3Y6DBG3AVCNFSM6AAAAABGD4A2LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRVHEZTEMBSGY . You are receiving this because you commented.Message ID: <databricks-industry-solutions/security-analysis-tool/issues/96/2065932026 @github.com>

madcole commented 2 months ago

hi @ramdaskmdb any updates on this? it's still a blocker for the team