gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
12 stars 12 forks source link

Bring DNA derived data extension into production #48

Closed thomasstjerne closed 3 years ago

thomasstjerne commented 3 years ago

https://github.com/gbif/rs.gbif.org/blob/master/sandbox/extension/dna_derived_data.xml

tucotuco commented 3 years ago

Just so you know, the MIxS group is about to submit a charter to update the MIxS Darwin Core extension and take it through the ratification process in TDWG. The work on this will be very fast relative to normal work on standards in our community.

On Tue, Nov 10, 2020 at 11:45 AM Thomas Stjernegaard Jeppesen < notifications@github.com> wrote:

  • Review the namespaces, terms and descriptions
  • Produce datasets shaping data from 3 different groups
  • Verify indexing in UAT GBIF
  • Create PR moving the extension to production

https://github.com/gbif/rs.gbif.org/blob/master/sandbox/extension/dna_derived_data.xml

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gbif/rs.gbif.org/issues/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZCZHY5AIVNDB4LCYDSPFGYTANCNFSM4TQX572A .

timrobertson100 commented 3 years ago

Thanks @tucotuco - we appreciate you following this repository and raising this early

We suggest neither this issue nor the TDWG MIxS ratification should complete without reviewing the work undertaken, and if they should be further aligned. There may well be good reasons not to align further, but if so we can collectively document the options so users are not left confused. It's certainly not the intention here to start competing options. Seem sensible?

For background, my understanding of the evolution of this extension is (broadly speaking):

  1. review the original MIxS work against the most recent MIxS standard (it was 5yrs outdated at the time - read on, though)
  2. identified and accommodated missing requirements that came from the GGBN community (new terms)
  3. identified an issue that MIxS merged info necessary for indexing in GBIF into single fields (e.g. forward/backward primer) and opted to keep them separate
  4. Included MIQE recommendations to also cover PCR-based assays
  5. Documented the approach in the (draft) Publishing DNA-derived data through biodiversity data platforms
  6. Liaised with the MIxS group who indicated an intention to investigate some of this for MIxS inclusion
  7. This led to Dmitry and Thomas joining the TDWG Genomic Biodiversity Working Group and the first session involved updating the MIxS extension.

CC @ramonawalls and @dschigel. Perhaps @ramonawalls could comment on how she feels this should proceed?

Thanks again

tucotuco commented 3 years ago

Also bringing in @raissameyer and @pbuttigieg who are leading the MIxS update and preparing the Task Group under the Genomic Biodiversity Working Group.

ramonawalls commented 3 years ago

I would like to strongly urge the use of GSC identifiers for MIxS terms that are part of this extension. I hope that the IRIs will be publicly available by January. See https://github.com/GenomicsStandardsConsortium/mixs-rdf

ramonawalls commented 3 years ago

I am still a bit confused about how this issue relates to the new task group @pbuttigieg is proposing, but I expect I will figure it out with time.

I will review the XML document linked above.

ramonawalls commented 3 years ago
  1. review the original MIxS work against the most recent MIxS standard (it was 5yrs outdated at the time - read on, though)

Not sue when this review was done, but MIxS 5 was released in early 2020, and we plan to release MIxS 6 in early 2021. Going forward, we can start by comparing against MIxS 5, and then when MIxS 6 comes out, make any adjustments needed. There will not be that many new terms in MIxS 6.

timrobertson100 commented 3 years ago

Thanks, @ramonawalls

The review was done at a time when there was only the original MIxS DwC-A extension, issued in 2015 and was well behind MIxS.

Currently, we have

and it looks like on the 23rd June there is a revision of the DNA derived data but with an unusual filename:

@thomasstjerne what is the intention with this third one, please? The filename implies it is a new MiXS DwC-A edition but the title is different from the previous versions.

pbuttigieg commented 3 years ago

Hi all

Thanks @tucotuco for looping us in - lots of opportunity here to sync for collective benefit and better TDWG/GSC coordination.

The Task Group - intended to work under GBWG - has much the same intent: there are a bunch of "alignments" and extensions that are pretty much unilaterally declared or built without coordination between TDWG and GSC.

This is already causing issues and leading to strategic splits that the community just can't afford now that eDNA/omics is becoming mainstream in biodiversity observation.

@timrobertson100

There may well be good reasons not to align further...

I can't imagine any that won't further bifurcate the biodiversity community and hurt everyone. There may be some serious compromising / negotiation to do, but we can't put this off.

... but if so we can collectively document the options so users are not left confused. It's certainly not the intention here to start competing options. Seem sensible?

Yes, this makes sense, and I think the Task Group is the way to do this - a formal procedure with a formal output from within TDWG.

@ramonawalls we should consider how to mirror this on the GSC side, perhaps with an official CIG activity / TG - we can then make this a Joint GSC/TDWG TG

For background, my understanding of the evolution of this extension is (broadly speaking) ...

That's close to how I understand the history too, but frankly I think a fresh look at this mapping is in order, especially as both the GSC and TDWG have evolved in their handling of their community standards (I'm so happy we're not working with spreadsheets anymore)

@ramonawalls

I would like to strongly urge the use of GSC identifiers for MIxS terms that are part of this extension. I hope that the IRIs will be publicly available by January. See https://github.com/GenomicsStandardsConsortium/mixs-rdf

This is absolutely essential - we (that is, the standards groups) need to authoritatively hard-bind the MIxS IRIs and DwC IRIs (even if they're not live yet, ideally using some sort of globally intelligible W3C standard. Any caveats on those bindings should be available in each resource.

However, this should only occur after the mapping has been vetted and formally endorsed by GSC and DwC representatives. The TG will do the preparatory work for the endorsers to have something to work with, and make sure the tech is reasonable.

pbuttigieg commented 3 years ago

@timrobertson100 @thomasstjerne shall we add you to the Task Group participants?

thomasstjerne commented 3 years ago

Yes @pbuttigieg , we are both happy to join

ramonawalls commented 3 years ago

@ramonawalls we should consider how to mirror this on the GSC side, perhaps with an official CIG activity / TG - we can then make this a Joint GSC/TDWG TG

GBWG is a joint GSC/TDWG interest group, so having this TG fall under GBWB solves that problem.

ramonawalls commented 3 years ago

At @timrobertson100 excellent suggestion, we now have a GBWG repo at https://github.com/tdwg/gbwg. I have invited @raissameyer and @pbuttigieg to that repo. Could one of you please create an issue there regarding the new task group?

We will be setting up a GBWG mailing list soon too.