Open msweetlove opened 4 years ago
@msweetlove We will discuss this request at our monthly call next week, and I will report back. If you are interested in taking part in that call to share your use case, you are welcome to join us. It takes place at 8AM PST/5PM CET on Monday 2/24. I can send you connection details if you decide to join.
Although I fully support specifying unique identifiers for events, I am concerned that including these terms in MIxS will cause confusion. We have a hard enough time getting people to use materialSampleID.
We discussed this at the CIG call. We think it is premature to add these terms to MIxS. Some groups (EBI, GGBN) have internal solutions they use for relating samples to one another, which we will write up in a short document.
Before MIxS adopts new terms, we feel there should be more work on community harmonization of data. For example, does GBIF plan to adopt a data model that encompasses both processes (i.e. events) and material samples? How do we harmonize with efforts from other disciplines (e.g., sample metadata used by IGSN for geological samples) or use of IDs in the archeological community.
Within the life sciences community, we can try to work this out using TDWG Genomic Biodiversity Working Group (https://www.tdwg.org/community/gbwg/). We are planning a 90 minute public session and 90 minute working session at Biodiversity Summit in September to address these issues.
Thanks for discussing the issue. Sorry I couldn't make the call, just became father on Monday, a bit earlier than planned...
I think for the microbiology community, the more DarwinCore and MIxS standards adapt the same terminology and structure, the easier this community will be able to work with (between) the two standards. As far as I know, GBIF is also already working on a data model that is centered on events, as using occurrences for our community was completely unworkable. So some shifts are slowly happening there. The problem in our community that further harmonization between the biodiversity/ecological data (counts, samples, culture, and dataset visibility on GBIF) and the genomic data related to that (genomic community profiles, environmental data,...) is still problematic...
Perhaps it would be fruitful to involve some people from GBIF and TDWG into this discussion? I will have a look at the TDWG Genomic Biodiversity Working Group, I wasn't aware of it.
I think the best route forward is directing the conversation at the upcoming TDWG meeting in September between the genomic biodiversity working group and the convenors of the Darwin Core vocab standard group (i forget the exact name). Figuring out how to frame the conversation at this meeting should be something we work on in the near future.
BTW, there is a legacy of small publications that came out of an RCN that ran for a few years, and which should be an interesting read (also eerily familiar):
https://www.ncbi.nlm.nih.gov/pubmed/21304642 https://www.ncbi.nlm.nih.gov/pubmed/23451293 https://www.ncbi.nlm.nih.gov/pubmed/23409219 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3746421/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940615/
On Mon, Feb 24, 2020 at 10:41 PM Maxime Sweetlove notifications@github.com wrote:
Thanks for discussing the issue. Sorry I couldn't make the call, just became father on Monday, a bit earlier than planned...
I think for the microbiology community, the more DarwinCore and MIxS standards adapt the same terminology and structure, the easier this community will be able to work with (between) the two standards. As far as I know, GBIF is also already working on a data model that is centered on events, as using occurrences for our community was completely unworkable. So some shifts are slowly happening there. The problem in our community that further harmonization between the biodiversity/ecological data (counts, samples, culture, and dataset visibility on GBIF) and the genomic data related to that (genomic community profiles, environmental data,...) is still problematic...
Perhaps it would be fruitful to involve some people from GBIF and TDWG into this discussion? I will have a look at the TDWG Genomic Biodiversity Working Group, I wasn't aware of it.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/36?email_source=notifications&email_token=AAIZ3RNAPB6GIOABNSXULTDRES4P7A5CNFSM4KNAZON2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2YEZQ#issuecomment-590709350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIZ3RJPD7IWBTJJZMRH5X3RES4P7ANCNFSM4KNAZONQ .
-- John Deck (541) 914-4739
Thanks for discussing the issue. Sorry I couldn't make the call, just became father on Monday, a bit earlier than planned...
Congratulation!!
as using occurrences for our community was completely unworkable.
A nightmare for us all.
Perhaps it would be fruitful to involve some people from GBIF and TDWG into this discussion? I will have a look at the TDWG Genomic Biodiversity Working Group, I wasn't aware of it.
Fortunately, TDWG people are already involved in the discussion (@jdeck88, @gdadade, myself and others). I will reach out to folks at GBIF to invite them to the working group meeting in the fall.
I want to propose to include the TDWG terms eventID and parentEvent to be included in MIxS.
Background: Samples in environmental and ecological studies (e.g. metagenomics of microbes) are often taken in a hierarchical experimental set-up. For example: when sequencing microbes along a depth profile of the water column in a lake, a sample hierarchy can look like this (from high to low level): scientific project > multiple lakes > multiple stations per lake > multiple depths per station. Another experimental approach that often occurs is the application of different sequencing techniques to one environmental sample (e.g. meta genome and metatranscriptome) or technical replicates are made for a single sample (e.g. sequencing a soil sample 3 times to asses the variability introduced by sampling and wet-lab procedures). In all these cases, there is a need to be able to group samples (that is: the events) at a higher levels (parentEvents). Moreover, this would also help to make MIxS more interoperable with the DarwinCore EventCore format, which is necessary for multifaceted ecological and microbial studies that rely on both standards.
Proposed terms: Label: eventID Definition: (from TDWG http://rs.tdwg.org/dwc/terms/index.htm#eventID) An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the data set.
Label: parentEventID Definition: (from http://rs.tdwg.org/dwc/terms/index.htm#parentEventID) An event identifier for the super event which is composed of one or more sub-sampling events. The value must refer to an existing eventID. If the identifier is local it must exist within the given dataset. May be a globally unique identifier or an identifier specific to the data set.