Sage-Bionetworks / dccmonitor

Allows for monitoring of data uploaded via the dccvalidator application. Functions for getting information on the uploaded files, metadata validation status, and more.
https://sage-bionetworks.github.io/dccmonitor
Other
2 stars 4 forks source link

Annotations feature - remove annotations instead of add #60

Closed kelshmo closed 4 years ago

kelshmo commented 4 years ago

Could we consider a model where step 2 of the Annotations tab instructs you to remove annotation keys for sensitive data instead of adding all the desired keys? There are only a few variables that cannot live as annotations (1 or 2) whereas many keys we want surfaced as annotations (more than 40).

It is a many clicks to much fewer clicks kind of proposal.

Aryllen commented 4 years ago

I was concerned that someone may miss removing an annotation that should not be on there. While possible to add a key with sensitive data if actively clicking on most of them, it's also easy (much easier, in my opinion) to skim over these sensitive data keys if passively looking at a long list. It does make sense to reduce the amount of clicking, though. I will look into this. Thank you for the suggestion!

Aryllen commented 4 years ago

Discussed this with @kelshmo today.

Ideally, we would have a list of keys that are not acceptable for annotations and remove these columns from the joined dataset before populating the annotation key widget. This would make it so PHI/PII would not be surfaced (unless a new key was added to metadata templates that was not added to the unacceptable keys in the app). However, getting this list would take time since input from Governance is needed.

In the short term, will add a config parameter for potentially unacceptable keys. The keys will not be removed from the list, but rather all other keys will be auto-selected. Then the user just has to verify that no other keys need to be removed before the annotations csv is created.