cagov / data-infrastructure

CalData infrastructure
https://cagov.github.io/data-infrastructure
MIT License
7 stars 0 forks source link

Improve Snowflake Security Step 1: Security Improvements #375

Open jkarpen opened 2 months ago

jkarpen commented 2 months ago
  1. Read email sent from Gabe Mullen, Snowflake rep, providing context for security changes: Snowflake Security Best Practices Email
  2. Review guidance from Snowflake on how to improve security.
  3. Review Trust Center documentation:
  4. The team met with Gabe on 8/28. Most of the call was about data caching best practices but this topic came up towards the end of the call and Gabe provided some guidance. You may want to watch the whole call as data caching will likely be relevant to you in the future. a. Zoom recording (Passcode:r5A?82Vm ) b. Meeting notes: [Notes - Snowflake Data Caching Best Practices (https://docs.google.com/document/d/12klynFZhlrYQuGVtd5TeMeHazd26KDFOw1jhhoGFqPQ/edit?pli=1) c. If needed we can schedule a follow-up call with Gabe to answer any specific questions not answered in the documentation/Zoom recording.
  5. Recommend necessary changes based on these recommendations.
  6. Implement changes (can wait until Ian returns)
ram-kishore-odi commented 2 weeks ago

Reviewed the docs.

The initial recommendations related to snowflake security are-

1. Create break-glass accounts following best practices 2. Enforce MFA on all accounts 3. Resolve all on high / critical issues reported by Trust Center to start with and then proceed to the others (new stories will be created) 4. Enforce RBAC best practices and hierarchy - new stories will need to be added here as well. A few references related to best practice recommendations for this topic are here (in addition to snowflake official documentation)

RBAC Best Practices

  1. Essential Guide to Snowflake System Defined Roles: https://articles.analytics.today/snowflake-role-based-access-best-practices-design-guide 2.Snowflake Role-Based Access: Best Practices Design Guide: https://articles.analytics.today/snowflake-role-based-access-best-practices-design-guide

Snowflake official docs -

  1. Snowflake documentation: https://docs.snowflake.com/en/user-guide/security-access-control-overview.html
  2. Multi-factor Authentication (MFA): https://docs.snowflake.com/en/user-guide/security-mfa

If you have any comments, please let me know

ian-r-rose commented 2 weeks ago

Can you say more about what you think we should do on RBAC? We do already have an RBAC setup that I believe conforms to the recommendations in Snowflake's docs.

ram-kishore-odi commented 1 week ago

Hello @ian-r-rose, My intention was to do a deep dive on the existing RABC setup using best practices defined in this article, https://articles.analytics.today/snowflake-role-based-access-best-practices-design-guide, and see if we can take advantage of items shown below.

I'm particularly interested in exploring these areas for potential improvement:

  1. Managed Access Schemas: using managed access schemas to prevent table owners from granting access to unauthorized roles.
  2. Role Hierarchy Optimization: Streamlining our role structure for clearer access control and easier management.
  3. Simplified RBAC management and OKTA User Provisioning : To reducing manual effort and risk. For example, working Kevin to over come automated provisioning / de-provisioning issues. The goal is to have a well defined process for role assignment and provisioning for new users

These are some of my thoughts, please let me know your feedback

ian-r-rose commented 2 days ago

I'd be very interested in taking a closer look at managed access schemas and whether we can use that to improve the security of our default setup. One wrinkle to this is that many of our schemas are created by dbt as part of the transformation step. As far as I know, dbt cannot create managed access schemas (cf https://github.com/dbt-labs/dbt-snowflake/issues/22) without overriding the schema creation macros. This may be worthwhile, but I think I'd like to better understand the tradeoffs:

  1. Right now with our default RBAC setup, the owner of most objects is {database}_{env}_READWRITECONTROL. This goes for both schemas and for tables/views in schemas. If there is no difference between the owner of the schema and a table in the schema, is there any benefit to making the schema managed access?
  2. Can managed access schemas help simplify the way we do future grants? If the schema owner already gets some variety of permissions on objects in them, maybe they can be used to remove some of our explicit future grants. Or maybe not, I don't think I fully understand the permissions model here.