BBMRI-ERIC / miabis

The Minimum Information About BIobank data Sharing (MIABIS) is a biobank-specific terminology enabling the sharing of minimal biobank-related data for different purposes across a wide range of database implementations.
https://www.bbmri-eric.eu/howtomiabis/
14 stars 10 forks source link

Clarification of collection semantics and flagging collection types #5

Open holubp opened 3 years ago

holubp commented 3 years ago

We need to clarify semantics of collections with respect to persistency and organizationally outlined collections, vs. collections defined around life cycle of samples, SOPs or quality measures implemented, vs. virtual collections created only as workaround for the lack of cubes.

This is a bit of discussion history on the topic:

Petr Holub Thu 2020-08-13 13:45 Dear both,

there is one more use case I forgot to list - which is currently in place in the Directory collections used to designate part of the material stored by a biobank that is processed/stored compliant to some standard (this is typically some sub-collection defined for a (subset of) particular material type) Cheers, Petr

Assoc. Prof. RNDr. Petr Holub, Ph.D. Senior IT/Data Protection Manager

BBMRI-ERIC | Neue Stiftingtalstrasse 2/B/6 | 8010 Graz | AUSTRIA Phone: +43 316 34 99 17-18 | Fax: +43 316 34 99 17-99 | Mobile: +43 664 88 72 18 77 Skype:holubp | petr.holub@bbmri-eric.eu | www.bbmri-eric.eu Petr Holub Wed 2020-08-12 21:45 Dear both,

I think this should be discussed as a part of the MIABIS Core update calls - because that semantics should be defined there. But I suggest that we should have materials prepared for that call where people put together different uses of the collections they have. We have clearly 4 use cases at least: collections being used just as aggregating statistical descriptors, collections used to describe certain processes and life cycles of samples in the biobanks (e.g., the long-term vs. short-term collection at MMCI) collections used to describe purpose for which the material was collected (different studies) collections used for creating very independent collections (e.g., the NL use case) If we just start it discussing without having this prepared, the likelihood of productive discussion is rather low. I would appreciate if MIABIS team takes care of facilitating this (Roberto?) - and people fill in their input.

Cheers, Petr

Assoc. Prof. RNDr. Petr Holub, Ph.D. Senior IT/Data Protection Manager

BBMRI-ERIC | Neue Stiftingtalstrasse 2/B/6 | 8010 Graz | AUSTRIA Phone: +43 316 34 99 17-18 | Fax: +43 316 34 99 17-99 | Mobile: +43 664 88 72 18 77 Skype:holubp | petr.holub@bbmri-eric.eu | www.bbmri-eric.eu Philip Quinlan Wed 2020-08-12 14:08 Hi Esther and All,

I think this just emphasises the need to spend some time on this. At the moment we have a one size fits all in relation to Collections, where as we are experiencing that there are several different concepts actually there. That all works OK when we loosely talk about Collections, but once we get into the specifics of collection names, IDs, and levels of persistence the differences emerge.

We should probably arrange a call to try and bring this all together.

All the best,

Phil

Cs-it mailing list Cs-it@lists.bbmri-eric.eu https://lists.bbmri-eric.eu/cgi-bin/mailman/listinfo/cs-it

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment.

Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law. Esther van Enckevort e.j.van.enckevort@rug.nl Wed 2020-08-12 11:59 Hi Petr,In the Netherlands the collections are not considered just statistical descriptors, we have several collections, like the Parels from Parelsnoer or the collections under the centralized biobanks in the UMCs that are persistent and are actually considered the biobank in citations, instead of the centralized facility and in most other cases the owners identify much more with the collection than with the biobank entity in the Directory. Esther van Enckevort MA, Project Manager

Genomics Coordination Center UMCG / University of Groningen, Dept. of Genetics Antonius Deusinglaan 1, 9713 AV Groningen, The Netherlands

Mail: e.j.van.enckevort@rug.nl | Phone: +31 (0)6 54 33 22 76 Skype: enckevort76 | ORCID: 0000-0002-2440-3993 www.molgenis.org Petr Holub Tue 2020-08-11 13:51 Dear Esther,

thanks for the comments.

As for the PID compatibility - we have been through the discussion of PIDs some time ago with Phil and others and the conclusion was that we start with allocating PIDs only to the biobanks and not to the collections. Or if we allocate it to collections too, they should be some kind of "persistent collections" specially marked like that. This is because the collections are primarily aggregators for statistical descriptors as we discussed today during the MIABIS meeting and they can change over the time when the biobank decides that some other structure of those statistical descriptors would characterize its content in a better way. I am about to update the old PID policy - but wanted to do it after we do the practical implementation, because we may run into other problems too and I would like to update it in one go.

As for your second question - I think that we should have at least biobanks citable in the first place. As for the particular collections I think it depends - and the answer might be what I say above (i.e., if you have in NL those collections that need standalone visibility, we would have to mark them as persistent - but their subcollections may not be persistent if they act as statistical aggregators again).

Cheers, Petr

Assoc. Prof. RNDr. Petr Holub, Ph.D. Senior IT/Data Protection Manager

BBMRI-ERIC | Neue Stiftingtalstrasse 2/B/6 | 8010 Graz | AUSTRIA Phone: +43 316 34 99 17-18 | Fax: +43 316 34 99 17-99 | Mobile: +43 664 88 72 18 77 Skype:holubp | petr.holub@bbmri-eric.eu | www.bbmri-eric.eu Esther van Enckevort e.j.van.enckevort@rug.nl Tue 2020-08-11 11:29 Hi everyone, The row level security is implemented in the backend, but the frontend for managing is not yet completed. We plan to start testing it with BBMRI-NL soon, and the dev team is looking into the frontend. However, once we implement the persistent identifiers we cannot use simple deletion anyway, because this would break the link with the persistent identifier. At the moment you can delete entities in the staging area, and we actually did build a way to propagate the deletion to the production database, but that hasn't been rolled out yet. As commented above, this would also conflict with requirements for the persistent identifiers. I did suggest that we circulate this proposal exactly to further define the lifecycle states and the rules regarding state changes. I think it is not a trivial thing to properly identify the valid states so it is important that we take the time to discuss them. From BBMRI-NL I also got some input that for the requestor the main interest is whether the collection is still including more subjects and whether there is still more data/samples collected for the subject and according to them collections that don't have anything available to be requested should not be part of the Directory. This touches the fundamental question that Phil also raised about what we want the Directory to represent. Is it a tool to find collections from which you can request samples/data or do we have a broader goal to also make collections citable. I'm happy that we started this discussion and i hope others will also give their input.

With kind regards,

Esther Cs-it mailing list Cs-it@lists.bbmri-eric.eu https://lists.bbmri-eric.eu/cgi-bin/mailman/listinfo/cs-it

-- Esther van Enckevort MA, Project Manager

Genomics Coordination Center UMCG / University of Groningen, Dept. of Genetics Antonius Deusinglaan 1, 9713 AV Groningen, The Netherlands

Mail: e.j.van.enckevort@rug.nl | Phone: +31 (0)6 54 33 22 76 Skype: enckevort76 | ORCID: 0000-0002-2440-3993 www.molgenis.org A.S. andrzej.strug@gumed.edu.pl Mon 2020-08-10 15:29 Hi, You are really watchful Petr.

To tell you the truth, I was thinking about only one flag - used for deletion. The Depleted state came to my mind at once as a possible usage extension. But then we started exchange of thoughts and emails, some new states appeared in them, and I devoted them not enough reasoning.

Obviously, we can use this one flag only for the likely states of the collection life cycle that are excluding themselves. But before we come up to an agreement, it could be nice to decide at least on deletion. It would give us solution for automatic (API) management of collections (and biobanks, contacts and networks).

Andrzej

Pobierz aplikację BlueMail dla systemu Android W dniu 10 sie 2020, o 13:55, użytkownik Petr Holub petr.holub@bbmri-eric.eu napisał: Dear all,

I think we are facing here 2 issues:

1) Marking the status of the collection - and I like the generalizing approach. A couple of comments from my side:

As for those flags om general, I guess it's meant to set those flags independently - so that it's not meant as a state of the collection, rather just flags describing its properties in the given moment in time. The reason for commenting: if it was meant as a state, we would perhaps need to have states described in a such a way that a collection can be in exactly one of the states. While as it looks now, at least some of the flags may be combined (e.g., recruiting + on site use only). If the above assumption is correct, you should also define rules for forbidden combinations (e.g., deleted + recruiting does not make sense) Depleted has at least 2 sub-states: "operationally depleted", when samples can no longer be handed out and only reference samples are kept for reproducibility verification, and "completely depleted" when all samples including references ones are gone. Archived flag is a bit complicated for me - what is its relation to (or difference to) Deleted and Depleted?

2) What I don't like is that the problem is indirectly induced by the use of staging area. We have discussed since 2-3 years at least that we should be able to avoid the staging area completely once we have column/row level security. I believe this has already been delivered - so I don't understand why we are still using the staging area when it's causing troubles... Morris, Esther?

Thanks, Petr

Assoc. Prof. RNDr. Petr Holub, Ph.D. Senior IT/Data Protection Manager

BBMRI-ERIC | Neue Stiftingtalstrasse 2/B/6 | 8010 Graz | AUSTRIA Phone: +43 316 34 99 17-18 | Fax: +43 316 34 99 17-99 | Mobile: +43 664 88 72 18 77 Skype:holubp | petr.holub@bbmri-eric.eu | www.bbmri-eric.eu