Squarespace / pgbedrock

Manage a Postgres cluster's roles, role memberships, schema ownership, and privileges
https://pgbedrock.readthedocs.io/en/latest/
Other
311 stars 35 forks source link

Support Google Cloud SQL #12

Open tayloramurphy opened 6 years ago

tayloramurphy commented 6 years ago

Similar to #9 I'd be very interested in using this to manage users on our Cloud SQL instance. Unfortunately, per the docs "any features that require SUPERUSER privileges" are not supported.

I'm working on testing this using the default postgres user, so I don't know exactly what any errors might be. But, this would still be very useful as a validation check against existing privileges.

Is there a route to rework the tool so that it could a) validate roles against a specified YAML role, b) write out the SQL that would have to be run to make any adjustments and c) all while using a non superuser role?

Basically, I'd see this as part of our test suite that would create an error if something was off, but wouldn't take action on it.

tayloramurphy commented 6 years ago

As expected, the errors were as follows:

Failed to execute query "SELECT * FROM pg_authid WHERE rolname != 'pg_signal_backend';":
  File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/common.py", line 56, in run_query
    cursor.execute(query)

  File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/psycopg2/extras.py", line 144, in execute
    return super(DictCursor, self).execute(query, vars)
zcmarine commented 6 years ago

Hey Taylor,

Man, not being able to use any SUPERUSER functionality is a huge blow here. pgbedrock relies on being able to become other users in order to grant default privileges, alter object ownership, and modify roles. I'm curious how those things are done in the Cloud SQL world without a superuser. Taking the last piece as an example (modifying roles), how would you currently change a role's ability to login, its connection limit, etc.?

It's also possible that I'm misinterpreting that statement in their docs: is there the concept of a superuser at all? What happens if you do CREATE ROLE mysuperuser SUPERUSER; as the postgres role (which is by default a superuser)? If that works, can you log in as that role and doing some things that probably aren't available to role by default? E.g.:

CREATE ROLE dummy_role;  -- Try creating a role
SELECT * FROM pg_shadow;  -- Look at a table that has restricted access

Even if that above fails, can you log in as the postgres role and execute those statements?

Lastly, was there anything after that colon in the first line of the traceback? Or was there more to the traceback?

Thanks! Zach

tayloramurphy commented 6 years ago

Thanks @zcmarine for the thoughtful response.

Here's what I see as the default for CloudSQL roles:

rolname rolsuper rolinherit rolcreaterole rolecreatedb rlcanlogin rolreplication rolbypassrls
cloudsqladmin x x x x x x x
cloudsqlsuperuser x x x x
cloudsqlagent x x x x
cloudsqlreplica x x x
postgres x x x x

So there is a superuser, but it's managed by cloudsql. As far as I can tell, I have no ability to login as the superuser. Trying to create a superuser role from the postgres user gives you the error "must be superuser to create superusers".

I'm able to create roles with the postgres user and am unable to select from the pg_shadow table.

Using the verbose option on pgbedrock generate this was the full stack trace:

DEBUG:context.py:__getattribute__:285 - Generating database context "get_all_role_attributes"
DEBUG:common.py:run_query:54 - Executing query: SELECT * FROM pg_authid WHERE rolname != 'pg_signal_backend';
Failed to execute query "SELECT * FROM pg_authid WHERE rolname != 'pg_signal_backend';":
  File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/common.py", line 56, in run_query
    cursor.execute(query)

  File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/psycopg2/extras.py", line 144, in execute
    return super(DictCursor, self).execute(query, vars)

Even with this not working, I'm planning on using the YAML format to document how the roles in CloudSQL are organized so that we can version control it and make it easier to reason about. The "nice to have" feature for me would be to use the YAML to validate that the permissions line up as expected and just return information about where there is a mismatch. Checking what permissions are granted to users should be able to be done by the postgres user, right?

zcmarine commented 6 years ago

Hey, sorry for the slow response.

Man, that's really interesting. I can see the security benefit in not allowing superusers to exist / connect, though it definitely adds a big (probably insurmountable) wrinkle here. The one thing I do wonder about though is how that cloudsqladmin role can connect / be used. Can you SSH directly onto that box and then either psql -U cloudsqladmin or and sudo su cloudsqladmin and then psql into your Postgres? I ask it's possible to create the security restrictions you're seeing but still allow that type of access. If so, there may be a way to make this work. I mean, all of those cloudsql* roles have rolcanlogin as True, so it seems like there must be a way to connect as them.

With that said, it's possible that the Postgres role can do most of what you want, particularly if you're focused mostly on permission management. Have you tried running pgbedrock with the other submodules off and seeing if it will run? I.e. Manually create a YAML file with the various roles in it and then do

pgbedrock configure \
    --no-attributes \
    --no-memberships \
    --no-ownerships \
    --privileges \

It's possible that the postgres role has enough access to do everything you need there. Unfortunately, you may be forced to generate the permissions deductively, i.e. look at what pgbedrock would have removed and then add that to your YAML spec, which is kind of painstaking. However, if this runs at all and that would meet your needs, it would be a small lift to modify pgbedrock generate to only build out the spec for certain submodules.

One last line of inquiry: it's possible that pgbedrock generate is failing in part because of the tables that it can access / that exist. This is copied from the Redshift issue (#9) and shows which tables that pgbedrock relies on do not exist:

SELECT
    desired_tables.table_name AS missing_tables
FROM (
    VALUES
        ('pg_auth_members'),
        ('pg_authid'),
        ('pg_class'),
        ('pg_default_acl'),
        ('pg_depend'),
        ('pg_namespace'),
        ('pg_roles')
    ) AS desired_tables (table_name)
    LEFT JOIN information_schema.tables info_tables
        ON desired_tables.table_name = info_tables.table_name
        AND info_tables.table_schema = 'pg_catalog'
WHERE
    info_tables.table_name IS NULL
;

If anything comes back then that indicates something we'd need to modify in pgbedrock. Additionally, you might run a SELECT * FROM foo LIMIT 1 on each of those as the postgres role just to verify if you have permission to read from them.

Hopefully at least something in the above is useful; I'm kind of throwing out a bunch of ideas and hoping we can get something workable from at least one of them.

Zach

tayloramurphy commented 6 years ago

@zcmarine Thanks for the response and apologies for taking so long to try this out. Here's what I'm seeing.

Can you SSH directly onto that box and then either psql -U cloudsqladmin or and sudo su cloudsqladmin and then psql into your Postgres?

It asks for a password and bombs out. According to the docs "This user account is used by automated processes that need to access the data in your instance (for example, backing up your instance or performing an import or export)." Similarly, sudoing returns No passwd entry for user 'cloudsqladmin'

Manually create a YAML file with the various roles in it

So I combined the analytics and tmurphy roles defined in our repo here into a single spec.yml and ran the following with these results:

$ pgbedrock configure -U postgres -w "MYPASSWORD" -d dw_production --verbose --no-attributes --no-memberships --no-ownerships --privileges spec.yml DEBUG:privileges.py:analyze_privileges:35 - Starting analyze_privileges() Traceback (most recent call last): File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/bin/pgbedrock", line 11, in load_entry_point('pgbedrock==0.1.2', 'console_scripts', 'pgbedrock')() File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/cli.py", line 45, in configure memberships, ownerships, privileges, live, verbose) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/core_configure.py", line 238, in configure module_sql = analyze_privileges(spec, cursor, verbose) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/privileges.py", line 44, in analyze_privileges schema_writers = determine_schema_writers(spec) File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/pgbedrock/privileges.py", line 178, in determine_schema_writers writers[schema].add(role) KeyError: 'zuora'

Repeating it causes it to KeyError on different table names. Maybe I have something wrong in the spec definition? Playing around with it the only change I could get is still this error Failed to execute query "SELECT * FROM pg_authid WHERE rolname != 'pg_signal_backend';":

Big SQL query

pg_authid is the only one that comes back. I have confirmed that the postgres user on Cloud SQL can read data from all the tables except that one. Based on my reading of the postgres docs, it seems like you should be able to get all of the information from pg_roles. From the docs:

pg_roles is a publicly readable view on pg_authid that blanks out the password field.

So! Playing around with this as I was writing the comment... I did a global find and replace in the project for pg_authid to pg_roles. Did the pip3 install and I got it to sort of work!

When I used a simplified spec.yml (see details) it actually tried to execute a bunch of commands! =>

Analyzing privileges: 
Failed to execute query "REVOKE SELECT ON TABLE analytics."f_churn_history" FROM "analytics";
REVOKE SELECT ON TABLE analytics."subscription" FROM "analytics";

And on and on. The final error on that being:

  File "/Users/tmurphy/Projects/GitLab/dev/pgbedrock/pgbedrock/common.py", line 56, in run_query
    cursor.execute(query)

  File "/Users/tmurphy/.local/share/virtualenvs/pgbedrock-x_-HagMe/lib/python3.6/site-packages/psycopg2/extras.py", line 144, in execute
    return super(DictCursor, self).execute(query, vars)

spec that spits out revoke statements and the error in execute

analytics:
    can_login: no
    is_superuser: no

tmurphy:
    can_login: yes
    is_superuser: no
    member_of:
        - analytics

But when my spec looks like this:

analytics:
    can_login: no
    is_superuser: no
    privileges:
        schemas:
            write:
                - analytics

tmurphy:
    can_login: yes
    is_superuser: no
    member_of:
        - analytics

It gives me the KeyError.

So it does seem like pg_roles gives enough information to do some validation. @zcmarine what are your thoughts on all this? 🙂

zcmarine commented 6 years ago

Hey Taylor,

Regarding the KeyError you were seeing, that makes sense to me. The short answer is that pgbedrock expects that the owner for each schema will show up somewhere in the spec file, and since you only included a subset of the role definitions that isn't true. The longer explanation is that the determine_schema_writers() function it calls determine_schema_owners(), and that walks through the spec and creates a mapping from schema to schema owner based on the schemas in each role's owns: section (as well as if they have has_personal_schema set to True). Again, since zuora isn't listed with an owner in this partial spec, a KeyError happens because pgbedrock assumes that the owner has at least been already added and that isn't true.

Note that this is also the explanation for the KeyError you mention in the end of the above comment: analytics can write to the analytics schema, so pgbedrock tries to add that to the writers list, but it is assuming that the owner for that schema has already been computed, but it has not! As a result, things fail.

The fact that pg_authid is the only missing table in Google Cloud SQL is encouraging. If you replace pg_authid with pg_roles throughout the codebase I would try running pgbedrock generate and see if you can get a tentative spec created. If so, you could then try pgbedrock configure in check mode and see if that works; note that I wouldn't run pgbedrock configure in live mode (--live) as it would try to replace all passwords with '**', which is the masking that pg_roles has for the rolpassword field.

The bigger question if the above works is what we want to do about this in the pgbedrock codebase. We want to continue to manage password configuration, and pg_authid is necessary for that. What would seem reasonable to me is:

So in my mind Step 1 is verifying that you can get pgbedrock generate and/or pgbedrock configure to work as mentioned above. If so, then if anyone from your team / others who want Google Cloud SQL support would like to take a shot at the above I'd be happy to sync up and talk this through in more detail.

tayloramurphy commented 6 years ago

We'll be getting some resources allocated to this later this month as getting this working with Cloud SQL is becoming a higher priority. PR's will be made here for sure, but we'll be tracking progress in https://gitlab.com/meltano/meltano/issues/195 on the GitLab side.

zcmarine commented 6 years ago

Hey Taylor,

That's awesome to hear! If it'd be useful to do a Google Hangout or slack conversation once you kick things off, please let me know. I'm happy to help how I can!

I went back through the above set of messages and have taken a shot at consolidating what we know:

Notes

Known Open Questions

  1. Can the postgres role run all necessary change / access statements? i.e.:
    • ALTER ROLE test_role WITH CONNECTION LIMIT 2; -- role change
    • GRANT some_other_role TO test_role; -- membership change
    • GRANT SELECT ON TABLE some_table TO test_role; -- privilege change
    • SET ROLE some_other_role; -- role impersonation
  2. If SUPERUSER isn't supported does that mean we can't ALTER ROLE test_role WITH SUPERUSER?
  3. How can we identify that we're within Google Cloud SQL? With Redshift we can see that via SELECT version();. What does Google Cloud SQL return for that query??
  4. Is there a way to test Google Cloud SQL locally (i.e. via a docker image) or do we need to hit a real Google Cloud SQL instance?

To Do

  1. Add a way for pgbedrock to know it is working with Google Cloud SQL (from Question 3 above)
  2. Create way for pgbedrock generate to only run certain submodules, e.g. pgbedrock generate --no-ownerships. Note that it's not clear if we actually need this. It'd probably be useful regardless but we may be able to scope it out of this issue depending on your needs.
  3. Choose the queries to run based on the Postgres implementation we're working with
    • For example, in Google Cloud SQL get all info from pg_roles instead of pg_authid
  4. Add logic within attributes.py to not do password management for Google Cloud SQL. Depending on Question 2 above we may also need to not do superuser management too.
  5. Figure out how to add testing. If we have to hit a real Google Cloud SQL instance (from Question 4 above) then we may want to just verify that the raw queries run against that and that we can do all of the alterations (from Question 1) that we need to do; the rest of the machinery is pretty implementation-agnostic.

I'm going to try working on item 3 from the To Dos in the next couple weeks and will keep you updated of any progress I make there.

zamai commented 6 years ago

Hi, @zcmarine! I'm a developer from the GitLab side that would like to help with this issue. First of all that a lot for the summary of the issue, I'm going point by point through it to get the answers that we need for implementing support of Google SQL.

Can the Postgres role run all necessary change/access statements? i.e.: ALTER ROLE test_role WITH CONNECTION LIMIT 2; -- role change

Works

GRANT Postgres TO test_role; -- membership change

Works

GRANT SELECT ON TABLE some_table TO test_role; -- privilege change

Works

SET ROLE test_role; -- role impersonation Doesn't work: ERROR: permission denied to set role "test_role" ERROR: permission denied to set role "cloudsqladmin"

Looking into permission of the Postgres user I see that it only has: createdb and createrole.

  1. If SUPERUSER isn't supported does that mean we can't ALTER ROLE test_role WITH SUPERUSER?

We can't, I tried:

ERROR: must be superuser to alter superusers

  1. Running: SELECT version();

PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit
Does not identify itself as a Google Cloud specific version. My idea on this: SELECT rolname from pg_roles where rolname='cloudsqladmin' and rolsuper=TRUE; if there is a result this looks very much as a Google Cloud SQL managed instance, but this is far from a robust solution.

  1. Is there a way to test Google Cloud SQL locally (i.e. via a docker image) or do we need to hit a real Google Cloud SQL instance?

As far as I can tell - no, the image must be pure Postgres with the custom settings from Google SQL. What specificaly we want to test? Maybe we can reach out to GCP support and ask directly if what we want to change is in any way different for pure Postgres (to be sure).

Now I'll be looking into TODOs, trying to poke around and undestand the codebase. PS I have a test DB running if you want access for dev purposes, just drop me an email (azamai@gitlab.com)

zcmarine commented 6 years ago

Hey Alex, thanks for the updates! If we can't find anything better, your idea about looking for a cloudsqladmin role doesn't seem bad (though like you said, it's definitely not the most robust).

The inability to impersonate other roles is going to add a real wrinkle here since default privileges are based on which user does the grant. I'm assuming that Google Cloud SQL has default privileges since Taylor said that the pg_default_acl table exists (based on the query I had him run above). In that case, I'm curious if you guys have any ideas on how to become a user and execute something as them. Otherwise I'm not sure how to manage default privileges in Google Cloud SQL, though it looks like most of the other modules should be doable (other than granting SUPERUSER, which seems like it isn't a thing in that ecosystem).

zamai commented 6 years ago

Hey Zack,

Yep, let's stick to the cloudsqladmin identification it should work for now, if we find a better way.

As I understood Taylors requirments are:

  1. Make pgbedrock generate work with Google Cloud SQL
  2. Validate the existing uses against specified YAML role
  3. Write out the SQL that would have to be run to make any adjustments (we can then run that SQL manually).

Since SET ROLE query is not allowed for potgres user, maybe we can avoid impersonating users and just print out the queries that needs to be run in order to change their permissions? (as per 2nd requirment).

Currently working on PR that would show inital ideas of how the indentification of the Google Cloud SQL would work to get your initial feedback on the approach.

lsowen commented 5 years ago

A similar problem exists for running on AWS RDS. Might be useful (if not too difficult) to generalize the handling for "semi-limited" environments like Google Cloud SQL and AWS RDS?