cggh / panoptes

Eyes on your (genomic) data
Other
36 stars 6 forks source link

Row level permissions on data tables #1248

Closed benjeffery closed 5 years ago

benjeffery commented 5 years ago

For a table where this is configured, when not logged in a user can only see (plot/download) rows that are marked as public. When logged in a user can see all rows that are marked as a group that they have access to, along with those that are public.

benjeffery commented 5 years ago

A strawman implementation:

Per table:

Per dataset:

When a query request comes in, the users' SSO groups are looked up in authMapping. This is reduced to a set of unique strings allowedRows, unless any of the entries is the string 'all', in which case allowedRows becomes the string 'all'. The query returns all rows where authColumn is in allowedRows, or authColumn is 'public' or all rows if allowedRows is 'all'.

leehart commented 5 years ago

I gather no authColumn and hence no auth mean that all the data for that table will be publicly viewable (or otherwise not under any authorisation, although it might be behind authentication).

leehart commented 5 years ago

I also gather that authPublicValue is the mechanism by which an implementer can specify which rows in a particular table (I gather the directives are per-table directives) are not under authorisation (are public, or universally visible/accessible). In this way, we can override (or nullify) other auth restrictions. So this is per-row granularity. It would be interesting to know why this is more efficient or more practical than having separate tables with separate auth settings, rather than having mixed auth in the same table.

leehart commented 5 years ago

I gather that the scheme for the authMapping directive, on the dataset level, allows us to compose multiple SSO auth groups into one Panoptes auth group as a convenience / higher abstraction, instead of listing the SSO auth groups in the data, for example, which would be harder to modify, etc.

leehart commented 5 years ago

It's not clear to me yet what an authMapping of 1) the default empty object means, 2) what else is in the object apart from the SSO groups as keys, 3) what the all keyword means (I gather it means all SSO groups but the implications of that are not clear, unless the meaning of the SSO groups are clear, e.g. i) are they mutually exclusive or do they overlap, ii) does it make sense to say this row of data can be seen by a user with all the SSO group permissions or this row of data can be seen by a user with any of the SSO group permissions, and how that might contrast with the all keyword.

benjeffery commented 5 years ago

I gather no authColumn and hence no auth mean that all the data for that table will be publicly viewable (or otherwise not under any authorisation, although it might be behind authentication).

Correct on both counts

benjeffery commented 5 years ago

It would be interesting to know why this is more efficient or more practical than having separate tables with separate auth settings, rather than having mixed auth in the same table.

This is so that a user can download "all rows that I have access to" in one query.

leehart commented 5 years ago

It would be helpful to my understanding (perhaps ours) to unpack that last paragraph a bit. (I'm finding it difficult to read and follow.)

  1. When a query request comes in, the users' SSO groups are looked up in authMapping.

I read that as: We have a query to process, so we need to know which SSO groups the user belongs to, and we need to compare those with the auth config of the dataset in question, to work out which data to include in the query (more importantly: which data to exclude).

  1. This is reduced to a set of unique strings allowedRows, unless any of the entries is the string 'all', in which case allowedRows becomes the string 'all'.

... still unpacking

  1. The query returns all rows where authColumn is in allowedRows, or authColumn is 'public' or all rows if allowedRows is 'all'.

... still unpacking

benjeffery commented 5 years ago

I gather that the scheme for the authMapping directive, on the dataset level, allows us to compose multiple SSO auth groups into one Panoptes auth group as a convenience / higher abstraction, instead of listing the SSO auth groups in the data, for example, which would be harder to modify, etc.

Exactly - my expectation here is that we'll end up using the study_id column, this will let us map SSO groups to study ids.

benjeffery commented 5 years ago

It would be helpful to my understanding (perhaps ours) to unpack that last paragraph a bit. (I'm finding it difficult to read and follow.)

  1. When a query request comes in, the users' SSO groups are looked up in authMapping.

I read that as: We have a query to process, so we need to know which SSO groups the user belongs to, and we need to compare those with the auth config of the dataset in question, to work out which data to include in the query (more importantly: which data to exclude).

  1. This is reduced to a set of unique strings allowedRows, unless any of the entries is the string 'all', in which case allowedRows becomes the string 'all'.

... still unpacking

  1. The query returns all rows where authColumn is in allowedRows, or authColumn is 'public' or all rows if allowedRows is 'all'.

... still unpacking

Now that I have slept and read it I find it really unclear! Here's an example:

authMapping:
    SSO_badgers:  study1, study3, study4
    SSO_birds: study5, study1
    SSO_lions: all

So a user who is just a bird gets study5, study1 and public rows. A user who has both badger and bird gets studies 1,3,4,5, public. Lions get all.

leehart commented 5 years ago

Thanks. It's still not clear to me what those values in the lists of strings in the authMapping object actually are, study1, study2, etc. Are they values in one / any authColumn in one / any table? Or what allowedRows is, and where it fits into this.

benjeffery commented 5 years ago

Thanks. It's still not clear to me what those values in the lists of strings in the authMapping object actually are, study1, study2, etc. Are they values in one / any authColumn in one / any table?

Values in any authColumn in any table. Could have it per-table, but I can't think of a clear use case there.

Or what allowedRows is, and where it fits into this. Just a local variable that contains the unique strings to look for in authColumn. I agree that calling it something probably confused the explanation!

leehart commented 5 years ago

Thanks. I think I see it all now. So if you were more verbose, you might call it something like listOfAuthColumnValuesPresentOnRowsToReturnToThisUser (not suggesting that!) And to simplify the cases where that list is equivalent to all / any authColumnValue (all / any rows), then we would expect the list to contain one item, the keyword all, which is in contrast to an empty object, which might imply that no rows will be returned to the user. And {'all', 'study1'} might be an anomaly. The last nuance I'm pondering is when an authColumn contains the value public. I suppose one way of looking at the value 'public' (actually the value of the config const authPublicValue, which might confusingly be set to all, just to test things!) is that it renders the allowedRows content effectively irrelevant, for that particular row.

benjeffery commented 5 years ago

To be precise it is listOfAuthColumnValuesPresentOnRowsToReturnToThis SSOGroup

['all', 'study1'] would return rows where authColumn had the value all and study1

For the last point you can imagine that authPublicValue gets added to allowedRows for every query.

leehart commented 5 years ago

I thought you covered the anomaly of {'all', 'study1'} by simply equating it to {'all'} if/when there is one or more 'all'.

leehart commented 5 years ago

Your first clarification about returning to this SSO group has confused me a bit... The user might belong to more than one SSO group, and we only care about returning things to the user. Are you saying that there is a list of allowedRows per SSO group?

benjeffery commented 5 years ago

You add every entry in authMapping that has a key that matches any SSO group the user belongs to. The user can have many.

leehart commented 5 years ago

I see your point about adding authPublicValue to allowedRows. I'm not sure if you mean that's the actual mechanism, or if that's effectively (or equivalent to) what would happen, but I reckon I see the point at least.

leehart commented 5 years ago

I need to expand and clarify your last comment a bit to make sense of it and check my / our understanding.

  1. authMapping is an object that maps SSO groups to lists of authColumn values
  2. the user belongs to a set of SSO groups
  3. allowedRows is the list of authColumn values that are mapped to the set of SSO groups (by authMapping) that the user belongs to (according to LDAP)
benjeffery commented 5 years ago

Yep! Sounds good.