galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.39k stars 999 forks source link

Dataset privacy usability issues when sharing histories with users #7479

Open hexylena opened 5 years ago

hexylena commented 5 years ago

We saw some confusion regarding the history sharing interface recently on UseGalaxy.eu. We saw it on 18.09, and I've reproduced on tip of the release_19.01 branch.

These are a couple problems which result in confusing and unexpected sharing as a result. The sharing issues are also not clear from the history-level sharing since the sharing is affected by dataset level permissions.

These were noticed due to a bug report from @teresa-m and @erxleben

Problem 1

The user is presented with a short error message which they may ignore, thinking "Ok, already shared, so it'll just update sharing to include the other user"

image

Bug: However, access permissions only include last user (probably due to error message in that screen, yea?)

image

Problem 2

Steps to reproduce:

Bug: The history sharing interface shows that the history is correctly shared with both users, but for some users this has been confusing as they aren't maybe aware of dataset level permissions. The user can see this and think "ok, it is shared with both", while it is actually only shared with the second person.

image

Correct Sharing

So this means the correct way to share data with an increasing list of users (while maintaining privacy) is to remove every user from the list, every time, and then re-add them all at once in a single sharing action?

sneumann commented 5 years ago

I can confirm that I am experiencing that confusion, and even with "correct sharing" as indicated above don't manage: I removed everyone, and then shared with a comma-separated list user user1,user2. Then I get:

Share 1 histories
The following datasets can be shared with user1, user2 with no changes
History
cool history name
Datasets
sample1.fastq.gz
sample2.fastq.gz
...
The following datasets can be shared with user1,user2 by updating their permissions
History
cool history name
Datasets
sample1.fastq.gz
sample2.fastq.gz
...

Observation: the two lists can be shared with no changes and can be shared by updating permissions are identical. Even saying:

How would you like to proceed?
  Make datasets public so anyone can access them
X Make datasets private to me and the user(s) with whom I am sharing
  Share anyway (don't change any permissions) 

I get only user1 in the access role of the datasets, and I can't change access permissions for the data sets manually, getting the error At least 1 user must have every role associated with accessing datasets. Since you are associating more than 1 role, no private roles are allowed.. I still see no way to share privately with more than one user. Yours, Steffen

hexylena commented 5 years ago

I had a user report an issue today where they had shared a history via link, and then clicked the "make datasets available" checkbox, expecting that to take effect, not realising that it only takes effect when you click "share via link", meaning you have to share and then unshare. (Yes there's a workaround in the gear menu)

hexylena commented 5 years ago

Another issue with sharing (on 19.05)

and again with

In neither case were the history permissions updated, so any new datasets created would have incorrect permissions.

I'm starting to think we should just scrap permissions completely, drop the tables, and try again from zero, I might get fewer user bug reports that way. Start it over again with only history-level permissions and no dataset level permissions, it's too granular for most people's usage and leads to subtle issues that are sometimes unpleasant to rectify for users.

sneumann commented 5 years ago

Hi, I love simplicity. What happens if I have one history A with raw data D plus some analysis, copy that into another history B with the same raw data D, but I replace the analysis steps with something else. Now I share history A with Alice, and history B with Bob. With dataset level permissions raw data D would magically have to be shared with Alice and Bob. Under the proposed model that would be controlled by permissions on the history levels.

Are there APIs to access a single data set ? If so, how would access to raw data D be controlled ? Yours, Steffen

hexylena commented 5 years ago

Hi @sneumann please note this is just me complaining since I often receive reports of this being confusing and end up sometimes making low level changes for the user to solve it. This was not a thought-out implementation proposal.

I appreciate your example of a real world use case that leverages this feature somewhat! That's useful to know people actually intentionally do things like that. My feeling is that's uncommon, but it's good to know it exists.

If I thought out this proposal just a little bit more (NB: only a little bit, if I really propose this then I'll ensure I've considered every possibility):

My initial implementation idea (in order to not upset people who were actively using the APIs to do fancy things) was to "pretend" that checks were only being done at the history level, and just do a better job of syncing. It would mean adding more code, rather than ripping out large chunks, but politically probably more popular. Just ensuring that whenever a user shares a history with some users, that:

When sharing:

This would probably mean that when a user shares a history that was copied, we would have to do something complex to ensure that really all of the users, on every copy of that HDA, can access it.

What I would like to do is just throw away dataset level permissions completely, but I guess this would be less popular.