Think through more granular permission and user management

bickelj commented 2 months ago

To the extent possible, I think we should let Keycloak be canonical for users, groups, and permissions. In the worst case, that means syncing data from Keycloak on occasion into PDC with some data redundancy. In the best case, it means PDC exclusively uses Keycloak interfaces to manage users. Keycloak has no concept of our data model (nor should it), so this means we will need to reference Keycloak objects from PDC.

bickelj commented 1 month ago

Looking through Keycloak authorization docs, Keycloak seems to have the ability to know about our application resources and to protect them. This is the reverse of what I said above regarding Keycloak not having a concept of our data model. My sense at the moment is that if we have to make updates in Keycloak for each and every record within one of our entities (example: adding a new organization or proposal causes a new resource created/tracked in Keycloak) I would hesitate to take that route right away. We don't want to merely protect /blah, we want to assign ownership of /blah/5 to one party and assign ownership of /blah/6 to another, etc.

One of the first things of note is that fine-grained authorization appears to be per Keycloak client. So that's the app (front-end) or the service (back-end) or the openapi docs client (back-end docs). I suppose that makes sense: it is resources within a given client (app) that are protected. On the other hand, if I'm logged into the front-end I still want to be able to access my back-end resources if I should be able to access those, and I want those permissions to be irrespective of if I logged into openapi docs or app or I am using curl. So if I can't model permissions once in a sensible way, e.g. for the service back-end resources, and have those permissions used across clients, using this particular means in Keycloak (per-client fine-grained authorization) seems like a non-starter to me.

slifty commented 1 month ago

@bickelj in the last phase we began to adopt a philosophy of: "keycloak is responsible for roles, but PDC is responsible for permissions"

Which is to say, keycloak would have things like "superuser" in it, but PDC would be responsible for knowing "this account is { associated with} with { Insert PDC Domain Entity Here }"

For instance, in that model Keycloak wouldn't be the place where a given account gets associated as having ownership permissions of a given organization.

I think this is the correct approach? You probably have a better sense than me. It seems the less we force KeyCloak to undertand the PDC the better we are in the long run (e.g. we rely on KeyCloak for authentication, we rely on PDC for authorization)

bickelj commented 1 month ago

@slifty That appears to be the most sustainable and balanced approach, yes. However, I'm considering if it's feasible for "PDC is responsible for permissions" to mean "PDC uses a standard permissions API exposed by Keycloak to manage permissions for PDC." At the moment it is still looking too complex.

My exploration of such Keycloak maximization still leads me in the direction of "Users and roles (or groups or both) in Keycloak, interpretation of who can access what in PDC by looking at the JWT and the PDC db." I think that's what you're saying above.

In other words, your expectation is still "permission for resource X in the PDC service database is an attribute or relation in PDC service database" and NOT "permission for resource X in the PDC service database is an attribute or relation in the Keycloak database to be queried via API."

Example of managing rest URLs/patterns from within Keycloak looks pretty hairy even when only talking about entire endpoints vs talking about individual resources within each endpoint which would be hairier still.

Summary: Keycloak can let you go nuts with permissions being managed in Keycloak for each and every URL. However, that entails a lot of data wrangling within Keycloak (even if automatically set via API). It would be difficult to automatically test. It would be difficult to see. It would be more complex. So we will want to let Keycloak have users and groups and roles while interpreting those (granting permission to objects in PDC) within PDC alone.

I say all that yet there is this little voice in the back of my head asking questions:

How much of Keycloak will we re-implement over time as more and more permissions/auditing capabilities need to be added?
Are there immediate auditing advantages to using the Keycloak system?
How much of that complexity is natural/essential to the permissions problem?
Is it really pretty simple within Keycloak once you find the right path forward?

bickelj commented 1 month ago

There are a number of stories regarding "my" data and "my" organization already present. Examples: https://github.com/orgs/PhilanthropyDataCommons/projects/4/views/6?sortedBy%5Bdirection%5D=asc&sortedBy%5BcolumnId%5D=58424322&sortedBy%5Bdirection%5D=asc&sortedBy%5BcolumnId%5D=Status&pane=issue&itemId=39313981 https://github.com/PhilanthropyDataCommons/meta/issues/14 https://github.com/PhilanthropyDataCommons/meta/issues/11

A significant one here: https://github.com/PhilanthropyDataCommons/meta/issues/48

bickelj commented 1 month ago

The two paths I still see are these:

Users, roles, and permissions all in Keycloak, managed by API calls from PDC.
Users and roles in Keycloak, ACLs or something like it on every permissionable row in PDC.

Both feel pretty ugly. I have a clearer idea of what the latter's initial implementation would look like in the PDC database, namely some jsonb column on every permissionable table in PDC, with some ACL-ish structure denoting which users or roles can do what in that JSON object. I have a vaguer idea of what the former's implementation would look like for using Keycloak, but it seems like a bunch of API calls to create permissions objects for every row created.

I tried to find some standard JSON schema for ACLs and there are some examples for S3 or other systems out there but I didn't find a clear standard. I like the "viewers, editors, owners" concept from https://cloud.google.com/storage/docs/gsutil/commands/acl, though. That makes sense to me.

PhilanthropyDataCommons / service

Think through more granular permission and user management #1093