MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

Clear rules for EventDatasets #2244

Open bemoody opened 4 months ago

bemoody commented 4 months ago

Some time ago, "EventDatasets" were added: the idea is that people who are participants in an event can temporarily have access to certain published projects, for the duration of the event.

Not speaking for anyone else, but I didn't feel like this system was rigorous enough to used for PhysioNet. So currently, EventDatasets don't do anything on PhysioNet: they don't allow people to access project files, nor GCP/AWS buckets, nor BigQuery. Currently, as far as I'm aware, EventDatasets only allow people to use HDN research environments.

Of course, this makes things quite confusing for developers and probably for event hosts/participants too! And we would like to have this capability for PhysioNet events eventually.

The sticking point, in my mind, is to have clarity in the user interface (the person who clicks the button needs to fully understand what clicking the button will do) as well as auditability (administrators need to be able to answer questions about who was given access to which datasets at what time.)

I think that there should first be a few additional rules imposed:

  1. Adding an EventDataset should require the events.add_eventdataset permission, not (or not only) user.view_all_events.

  2. In order to add an EventDataset, the event's host must be fully authorized to access that project (i.e., must have up-to-date training, DUA signature, etc.)

  3. Once an EventDataset has been added, the host should not be able to change the event's start or end dates.

  4. In order to add a participant to an event, the host or cohost who is approving the participant must be fully authorized to access the project(s).

  5. EventDataset permissions should not take effect before the event's start date. (Currently, only the end date has any effect.)

Second, there should be additional clarity in the UI:

  1. The form where EventDatasets are added should say something like:

By adding a dataset, all participants (approved by EVENT_HOST_NAME or their cohosts) will be allowed to access this dataset for the duration of the event (N days, from START_DATE to END_DATE).

There are currently NUMBER approved participants in this event.

Note that once a dataset is added to the event, the dates of the event cannot be changed.

  1. The form where participants are approved should say something like:

If a participant is approved to join this event, the participant will be allowed to access the following datasets for the duration of the event (START_DATE to END_DATE):

  • PROJECT TITLE (VERSION)
  • ...
Rutvikrj26 commented 4 months ago

@bemoody it would be helpful if you can add your thoughts on the following point for a bit more clarity:

The Event host is a platform admin (Which is a base assumption as the functionality to add an event dataset is inside the admin console). The implementation of project access checks for admin seems redundant. Yes, this point would make a ton of impact if a regular user can create an event (an interesting functionality to be had - but a separate feature in itself.)

This refers to the points:

2. In order to add an EventDataset, the event's host must be fully authorized to access that project (i.e., must have up-to-date training, DUA signature, etc.)

4. In order to add a participant to an event, the host or cohost who is approving the participant must be fully authorized to access the project(s).

Rutvikrj26 commented 4 months ago

@bemoody I've addressed the rest of the comments and updated the PR. Please take a look and let me know.

bemoody commented 4 months ago

Well, we've done our best to move away from a binary classification of administrators and non-administrators.

On PhysioNet we allow external users (course instructors, conference organizers) to have event-creation privileges on request. That's one of the things the event system was designed for, and it's useful so we can keep track of credentialing/training for participants as a group. But the event hosts generally don't have editorial or credentialing privileges. So they also don't have privileges to add event datasets.

Rutvikrj26 commented 4 months ago

Will it be possible to jump on a quick call to better understand the event creation workflow. I've only seen the workflow on HDN side and assumed(my bad) that it would be the same for Physionet. I'm available full day today and tomorrow - so any time that works for you should be fine.

bemoody commented 4 months ago

Sure - how about 11:00 tomorrow?

Rutvikrj26 commented 4 months ago

@bemoody is there a specific implementation of a query somewhere on the platform that you know of, where all the projects a user has access to are listed? I am currently implementing a "clunky" query using CredentialApplication, which I'm sure is missing some cases for sure. If you have an example that I can look at, it would be helpful for a cleaner implementation.

bemoody commented 4 months ago

I think that's what get_accessible_projects is for (in project/authorization/access.py).

It looks like nothing uses that function right now? Looks like the research environment is instead using PublishedProject.objects.accessible_by(user) (in project/managers/publishedproject.py.)

I'm not sure whether the two are equivalent. Weirdly, it looks like neither of them are honoring deprecated_files for non-open-access.

In any case, I think here we'd want not to include EventDatasets in the query. Maybe add an optional argument to get_accessible_projects to include/exclude events?

Rutvikrj26 commented 4 months ago

I've updated the implementation based the above suggestions and added a parameter include_event_datasets that has a default value True if not passed in but is passed in false from the form.