gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

where to publish ex situ data? #179

Open ymgan opened 1 month ago

ymgan commented 1 month ago

Hello,

I wanted to check with you all, how and where do you publish ex situ data?

We have sampling sites where aside from biomass, macrofauna collected, at each location, four 10 cm diameter sediment cores were collected for biogeochemistry measurements: Fluxes of O2, Dissolved Inorganic Carbon and nutrients (NO3-, NH4+, PO43-) were measured ex situ dark and light sediment core incubations in a temperature-controlled water bath to determine benthic net primary production and respiration and organic matter mineralization rates.

The biomass, macrofauna should be published to GBIF. How do you deal with the biogeochemistry measurement data? These are ex situ measurement of environment samples (sediment cores) and I guess these should not go to GBIF, right? Do you put them in Zenodo?

Thanks a lot!!

rukayaj commented 1 month ago

Hi Ming, I actually don't recall ever needing to publish data like this, so I'm not sure of the best way to do it. I would also guess probably not in GBIF though, Zenodo seems like a sensible choice. Or maybe something like https://www.pangaea.de/submit/ ?

ymgan commented 1 month ago

Thanks a lot @rukayaj !!! That was similar to what @lhmarsden suggested too (but that was for netcdf file in our separate conversation) I appreciate it!

lhmarsden commented 1 month ago

I would suggest Pangaea. Its very difficult to find anything on Zenodo unless you already know it exists and where it is.


From: Yi-Ming Gan @.> Sent: Friday, May 3, 2024 2:24:53 PM To: gbif-norway/helpdesk @.> Cc: Luke Marsden @.>; Mention @.> Subject: Re: [gbif-norway/helpdesk] where to publish ex situ data? (Issue #179)

Thanks a lot @rukayajhttps://github.com/rukayaj !!! That was similar to what @lhmarsdenhttps://github.com/lhmarsden suggested too (but that was for netcdf file in our separate conversation) I appreciate it!

— Reply to this email directly, view it on GitHubhttps://github.com/gbif-norway/helpdesk/issues/179#issuecomment-2093016124, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMNBFJ5FOBYUVYDKUSQPYTZAOF2LAVCNFSM6AAAAABHFKV76GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTGAYTMMJSGQ. You are receiving this because you were mentioned.Message ID: @.***>

MichalTorma commented 1 month ago

Hmm. Can't this be Event core with Extended measurement or fact extension? it does not break the star schema since booth occurrence and measurement are hooked to the event. Or am I missing something?

ymgan commented 1 month ago

Technically it doesn't. It's more about the nature of the data. Because this is experimental data (ex situ) of environmental samples generated from incubation at the lab, not measured on site (in situ). That's why I (we) think it should not go to GBIF.

dagendresen commented 1 month ago

I do think that ex situ measurement data can have a home in GBIF!

I see very many parallels between plant genetic resources and marine data - as was also discussed at the OBIS meeting in Brussels in February 2018.

Based on the rationale that is (plant genetic resources) also described in the 2016 GBIF/Bioversity Task Group on GBIF Data Fitness for Use in Agrobiodiversity (Arnaud et al 2016).

For me, the OBIS extended Measurement or Fact extension (eMoF) together with the sampling event core solved (!) many of the challenges we had for aligning plant genetic resources with GBIF (De Pooter et al 2017).

I agree with Michal that event core can be used - but I think maybe in a different way than Michal maybe is thinking of...(?). I assume the event that Michal is thinking of is the source collecting event when the material entity was collected in situ (in nature).

The ex situ material would be of the (relatively new) Darwin Core Material Entity class and included in a collection aka TDWG CD of specimens (Darwin Core PreservedSpecimen).

dwc:MaterialEntity = An entity that can be identified, exists for some period of time, and consists in whole or in part of physical matter while it exists.

The measurements made ex situ are (as Ming points out) obviously not measurements made on the "organism occurrence" (completely different time and location). The "Occurrence" is long gone in the past. And the measurements are clearly made on the new MaterialEntity.

However, I see absolutely no reason why the experiments collecting the ex situ measurements are not completely valid new Darwin Core Events!!!

dwc:Event = An action that occurs at some location during some time.

In agrobiodiversity the identity of and metadata for the ex situ experiments is essential to make sense of the experiment measurement data. The location of the experimental trial fields is very important because different locations have different environments including biotic disease stresses. The temporal data for the ex situ experiment is also very important because it helps to document the season and plant growth stage (seedling, flowering, seed maturing stages), etc. We tried to document the experiment event in an extension to the "Occurrence core" ... in an August 2009 EPGRIS3 meeting, (van Hintum et al 2009).

However, I could never really make these extensions match well with the Occurrence core (Endresen & Knupffer 2012). https://rs.gbif.org/extension/germplasm/ https://rs.gbif.org/extension/germplasm/MeasurementTrial.xml

The emergence of first (1) the GBIF Sampling Event core, and next (2) the OBIS extended measure or fact extension (eMoF), really helped me out of the fog. And I experience the relatively recent Darwin Core Material Entity (3) is a major step towards the solution that emerged so clear for me with the Event core + eMoF back in 2018 (see the February 2018 OBIS meeting in Brussels and the May 2018 regional GBIF meeting in Tallinn). https://rs.gbif.org/extension/obis/extended_measurement_or_fact_2023-08-28.xml https://www.gbif.no/news/2018/gbif_obis_event_core_workshop.html https://www.gbif.no/events/2018/gbif-eu-2018.html

One of the essential remaining missing pieces (in my mind) is adding a resource identifier (dwc:resourceID) to the eMoF (and getting rid of occurrenceID and/or adding materialEntityID).

dwc:resourceID = An identifier for the resource that is the subject of the relationship.

We could then have the ex situ experiments identified as sampling events (dwc:Event) identified by dwc:eventID in a GBIF Event core dataset. The ex situ samples identified as material entities (dwc:MaterialEntity) by materialEntityID. And the measurement data points identified as dwc:MeasurementOrFact with dwc:measurementID and obis:measurementValueID.

dwc:measurementID = An identifier for the resource that is the subject of the relationship.

obis-emof:measurementValueID = An identifier for facts stored in the column measurementValue (global unique identifier, URI). This identifier can reference a controlled vocabulary (e.g. for sampling instrument names, methodologies, life stages) or reference a methodology paper with a DOI. When the measurementValue refers to a value and not to a fact, the measurementvalueID has no meaning and should remain empty.

ymgan commented 1 month ago

Thank you so much @MichalTorma and @dagendresen ~ (Dag, you should rest in the weekend 🐻)

Anyway, thank you for your input!! I think what you said makes sense, but you also lost me at:

The ex situ samples identified as material entities (dwc:MaterialEntity) by materialEntityID.

The only extension that I know that has materialEntityID is the Occurrence extension, which does not makes sense to me to use it for this use case because of the definition of dwc:Occurrence class:

An existence of a dwc:Organism at a particular place at a particular time.

Are you perhaps talking about the new data model with Frictionless Data Schemes? I think my question is whether there is an extension that I can use to achieve what you describe to identify the ex situ samples as material entities (dwc:MaterialEntity) using materialEntityID.

Otherwise, if I understand you well, I agree with you about the dwc:resourceID in eMoF (although I am still confused when to use dwc:resourceID or dwc:relatedResourceID) I believe @lhmarsden tried to express this in this pull request.

dagendresen commented 1 month ago

The only extension that I know that has materialEntityID is the Occurrence extension, which does not makes sense to me to use it for this use case because of the definition of dwc:Occurrence class:

An existence of a dwc:Organism at a particular place at a particular time.

The Occurrence core does not make any sense in so many ways! However, I hope you agree that museum specimens are MaterialEntity things and NOT Occurrence things. How do we publish these in GBIF today? (rhetorical question). Would not the current practice of publishing museum specimens using the Occurrence core mean that the Occurrence core does not mean that the records published using this core are many other different things than only Occurrence things? Besides, there is a basisOfRecord term in the Occurrence core which I generally understand to mean the type of thing the record represents. And basisOfRecord has MaterialEntity as one of the suggested controlled values :-)

But yes, this (almost) keeps me awake at night too :-)

dagendresen commented 1 month ago

There also is a Material / MaterialEntity core in the Sandbox: https://rs.gbif.org/sandbox/core/dwc_material_2023-04-29.xml Needs much more work, but at least it has the appropriate RowType

dagendresen commented 1 month ago

when to use dwc:resourceID or dwc:relatedResourceID)

When the MaterialEntity thing is the subject of the measurement, I might maybe want to use resourceID = the PID for the MaterialEntity (the PID declared as the materialEntityID for the MaterialEntity).

If the MeasurementOrFact thing is the subject and the MaterialEntity is the object in the statement, I might maybe want to use relatedResourceID = the PID for the MaterialEntity, and think of the measurementID PID as the PID for the subject (the resourceID)?

However, it is easier for me to think of the previous statement with the MaterialEntity thing as the subject, and the measurement value as the object, and the MeasurementOrFact thing as only a wrapper for the annotation.