bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Display certain inaccessible sample data in search results #816

Open mikkonie opened 4 years ago

mikkonie commented 4 years ago

In GitLab by @dieter.beule on Feb 11, 2020, 13:39

This is really an feature request: We claim to do FAIR data management. Currently if I search for a sample ID I will only get results for projects that I have access to. That is not really FAIR, I should see that some has data even if I can not access it (details need to be discussed defined)

Even more complex: I might want to search for "colon cancer" or even "WES" AND "Colon Cancer". And here we get into real trouble because some people might want to protect some of their metadata (e.g. HPO terms) from being searched.

Probably we should have some more detailed requirements analysis here

mikkonie commented 4 years ago

Yes, this definitely needs a clear requirements definition from 1) when do we want to show all data and 2) when to not. By default, all data in SODAR Core and based sites is only shown if you can access it.

mikkonie commented 4 years ago

In GitLab by @dieter.beule on Feb 11, 2020, 15:56

this was really meant as "Future Improvements" that I wanted to track, no need for immediate action - is there a "milestsone" or other "box" for these or how shall we handle these kind of things?

mikkonie commented 4 years ago

I think no milestone is the way to go for future improvements :)

mikkonie commented 4 years ago

In GitLab by @holtgrewe on Feb 11, 2020, 22:46

@dieter.beule I agree that we need more fine-grained access control list here. However, I don't see how we violate anything in Box 2 of the article with the current implementation. For findable, our UUIDs are suffcient, for example.

mikkonie commented 4 years ago

In GitLab by @dieter.beule on Feb 11, 2020, 23:46

In the article "Findable" asks for
"F4. (meta)data are registered or indexed in a searchable resource"

My interpretation is that I should be able to ask question like 1.a "What kind of data sets do I have for (global/unique) sample ID xyz" or at least 1.b "Who/which study has data for (global/unique) sample ID xyz" 2.a "List data (set) with metadata abc" or at least 2.b "Who/which study has data with metadata abc"

The only question is who should be allowed to ask that question and we might not find an easy answer. While I would assume that everybody in the institute should be allowed to ask question 1.a (and then needs to talk to the data owner if he/she want to access the actual data). It seems fine with me (and the FAIR concept) if this question is not allowed from the outside world, at least for the kind of data we are dealing with.

Question 2 is more diffcult because it can be "abused" to gather information on studies content. If we allow 2.a for HPO terms, geneticis might not want to put them into SODAR as metadata.