DASSL / ClassDB

An open-source system to let students experiment with relational data
https://dassl.github.io/ClassDB/
Other
7 stars 2 forks source link

Remove duplicatation of ClassDB.foldPgId() in addCatalogMgmt.sql #151

Closed wildtayne closed 6 years ago

wildtayne commented 6 years ago

This PR contains commits that fix #110. The catalog management functions are now owned by ClassDB, so they can execute all helper functions in the ClassDB schema, including ClassDB.foldPgId(). Additionally, checks have been added to ensure users cannot access objects in schemas they do not own with these functions, and Public.describe(VARCHAR) now gets the user's schema using ClassDB.getSchemaName().

I would particularly like to know if ClassDB.IDNameDomain should be used over VARCHAR in these functions, and if I am missing any casts when using functions that use ClassDB.IDNameDomain.

Fixes #110

smurthys commented 6 years ago

Thanks @srrollo for making the changes. I have the following observations.

General:

Function listTables:

Function describe(tableName):

Question: Could the SET LOCAL statement on L28 be moved up to the first executable line in this script? If yes, I think we should make this change (in all scripts as we touch them) so we don't inadvertently add some code above that causes notices we want suppressed.

Nice to have: It will be nice to have the 1-arg describe function handle schema-qualified table names, for example, pvfc.customer_t. We have discussed this possibility in the past, but we should not implement it in M2. Instead, maybe we should add an issue so it gets attention in a later release. (It will make a perfect issue for new contributors to cut their teeth.)

smurthys commented 6 years ago

On second thought, the default value of parameter schemaName in function listTables should be CURRENT_SCHEMA. It should also coalesce to CURRENT_SCHEMA.

Likewise, the 1-arg describe function should lookup the table in CURRENT_SCHEMA.

Love to hear your thoughts.

wildtayne commented 6 years ago

Thanks @smurths for the review. The new commits up to 5c57674 address most of the points raised. I have some more thoughts below:

ALTER FUNCTION public.permissionTest() OWNER TO ClassDB;


Returns the error `ERROR:  permission denied for schema classdb` when executed as a student. This is also why the schema access check is required. I do have some other ideas on how to solve this problem if this solution has significant issues, namely either creating a proxy function `public.foldPgID()` for students, or granting `USAGE` on `ClassDB` to students, and only giving them execute on a few functions. I like the proxy function solution, but we may have to write a few, since it seems like we will at least want `ClassDB.getSchemaName()` as well.

- I think the best default parameter would be to use `ClassDB.getSchemaName(CURRENT_USER::ClassDB.IDNameDomain)`, and `COALESCE` the parameter with `ClassDB.getSchemaName(CURRENT_USER::ClassDB.IDNameDomain)` as well. It seems this would avoid the situation where many students are able to just use `public.listTables()` because their schema names match their user names, but one student has to remember to input their non-matching schema name. I imagine that could be pretty confusing to the students.

- `SET LOCAL` can only be [used in a transaction](https://www.postgresql.org/docs/9.6/static/sql-set.html), so it has to appear below `START TRANSACTION`. I recall there being issues with using `SET SESSION` for the same purpose, but I don't remember the details. Maybe @afig can weigh in on this.
smurthys commented 6 years ago

Thanks @srrollo for the analysis.

I feel the first thing to check is what happens if a student user directly queries info schema to list and describe tables. Any solution we provide will depend on that behavior because we simply want to provide an abstraction to those direct queries.

wildtayne commented 6 years ago

It appears that row-level access control is used to prevent users from accessing info schema metadata about objects they do not have permission to access. For example, I have several student accounts on my test installation. If I query INFORMATION_SCHEMA.SCHEMATA as student01, only the schemas, student01, public, INFORMATION_SCHEMA, and pg_catalog are shown. And if I query INFORMATION_SCHEMA.TABLES, only tables from those schemas are shown. However, if I wrap the query in a function owned by ClassDB with SECURITY DEFINER, the student can use the function to access metadata about objects in other schemas, such as student02.

smurthys commented 6 years ago

Thanks @srrollo for confirming what I was suspecting when I wrote earlier "let the DBMS accept/reject the query on INFORMATION_SCHEMA".

I believe the following:

For the next release, we should think about both the utility and the placement of these functions.

wildtayne commented 6 years ago

Thinking about it more, I think the plan by @smurthys is a better solution. Replacing built-in security checks with custom ones is never a good idea.

wildtayne commented 6 years ago

I've addressed most of the issues discussed in this thread and our discussion.

One final issue I have is that students also won't be able to access ClassDB.IDNameDomain since the functions are now SECURITY INVOKER. Should I switch the functions back to using VARCHAR(63), or duplicate ClassDB.IDNameDomain? Perhaps it makes sense to have ClassDB.IDNameDomain in public anyway.

smurthys commented 6 years ago

Ah, yes. We have to use VARCHAR(63) here.

wildtayne commented 6 years ago

OK, I've reverted to VARCHAR(63).

smurthys commented 6 years ago

I have the following observations on commit e9804d2:

wildtayne commented 6 years ago

For some reason, the function call on L101 will not coerce NAME to VARCHAR. I get the following error at compile time: function public.describe(name, character varying) does not exist It seems Postgres tries to call using NAME instead of coercing to VARCHAR. I did the cast to VARCHAR(63) since public.describe() takes VARCHAR(63), but just VARCHAR works as well.

I think this is related to the issue with coercing NAME to ClassDB.IDNameDomain.

wildtayne commented 6 years ago

I've just pushed the commits that address the latest points brought up by @smurthys.

afig commented 6 years ago

Overall looks great, student users can access these functions correctly without having to specify a schema in my testing. One possible overall improvement is better informational output, but that's clearly for another PR and milestone.

Just a couple of minor observations (most of the issues behind these notes were pre-existing):

wildtayne commented 6 years ago

Thanks @afig for the feedback. I've fixed the issues pointed out. I think for now, I feel better leaving your attribution as-is, since you did write pgFoldID(), and you do have some miscellaneous contributions to this file in the commit history.

smurthys commented 6 years ago

The changes look good. I like how folding and coalescing are combined on L66.

BTW, it seems the folding on L102 is not required because that value gets folded in the 2-arg describe function.

Related to this PR only because of the duplicated foldPgID: I wonder if foldPgID should TRIM its parameter prior to SELECT so the callers don't have to:

I ask this question because the result of foldPgID is almost always used in an equality predicate, and clearly Postgres trims non-quoted identifiers.

wildtayne commented 6 years ago

Thanks @smurthys for the comments. I've pushed a commit removing the unnecessary call to foldPgID(). I agree that TRIM can be used in foldPgID(), however we may want to address it in another branch so we can double check the solution and make sure pgFoldID() is updated in both places.