gbif / ipt

GBIF Integrated Publishing Toolkit (IPT)
https://www.gbif.org/ipt
Apache License 2.0
126 stars 57 forks source link

FAQ: How do I publish absences? #1746

Closed albenson-usgs closed 1 year ago

albenson-usgs commented 2 years ago

Q. How do I publish absence data? A. Step 1: Include sampling event records even if the sampling yielded no derived species occurrences. This allows species absences to be inferred.

I wanted to raise the issue that following the directions in the IPT manual for this FAQ would not always be appropriate for datasets shared via the OBIS community. As noted in the paper published about OBIS-ENV-DATA https://bdj.pensoft.net/article/10989/element/2/3386095// there will be events with no occurrences because sometimes the abiotic sampling is at a greater frequency than the species observations. These should not be interpreted as absences.

muttcg commented 2 years ago

Hi @albenson-usgs I can share the issue from interpretation point how we identify absent data, please read this issue https://github.com/gbif/pipelines/issues/268

MattBlissett commented 2 years ago

The quoted part of the manual is here: https://ipt.gbif.org/manual/en/ipt/2.5/sampling-event-data#q-how-do-i-publish-absence-data

ahahn-gbif commented 2 years ago

Occurrence records marked as absences would not be an issue, I agree. The tricky part would be in the interpretation of sampling events without occurrences as a bundle of absence records. However, this should never happen in isolation ("we did not find any organisms"), but ideally be interpreted against a timestamped checklist of the taxa that could have been expected. I may have overlooked something, but I do not think that GBIF, so far, makes these inferences during ingestion - the recommendations are on the publication process alone.

From the ingestion perspective: if GBIF do not want to index events not related to biological sampling (which I assume is true?), but published within the same dataset that also publishes sampling events on organisms, we would need to identify or discuss a level of content standardization that allows to tell different types of sampling event records apart. I am not entirely sure whether we do receive such mixed datasets from our partners; an example would help here - would you be able to point us at a dataset, @@albenson-usgs?

The most straightforward solution so far would be to publish organism-related and non-biological sampling parameters in separate datasets. Darwin Core does not offer a dedicated "sampling event type" filter with a reliable, standardized vocabulary to recognize those. The samplingProtocol would be the closest we can get, but content received is completely unstandardized so far.

Thanks for bringing this up, we will need to consider this in the context of absence evaluation from sampling events. Likely something along the lines of "if no corresponding taxon checklist is provided, do not interpret absences".

albenson-usgs commented 2 years ago

An example dataset is here. Just published a few minutes ago. There are 125716 events with no occurrences (17118 events with occurrences- absences are explicit in the occurrence table using occurrenceStatus).

The slight problem is that it needs to be "if no corresponding taxon checklist is provided, do not always interpret absences."

I know that for instance the recently published NEON tick dataset it is ok to infer absences from events with no occurrences.

mike-podolskiy90 commented 2 years ago

Is this still relevant please?

ahahn-gbif commented 2 years ago

I may be confused here. Given that GBIF do not, to my knowledge, evaluate sampling event datasets against any checklist, and do not infer absences at all, this seems to be an issue limited to (a) data publication guidelines for the IPT and (b) data ingestion from IPTs-published datasets by OBIS. I can see the problem of inferring absences, but it does not concern any current GBIF workflow. Is this possibly a discussion better to continue in the OBIS GitHub, @albenson-usgs?

albenson-usgs commented 2 years ago

The issue is with the instructions in IPT manual. If GBIF is not evaluating absences in this way then the text is not accurate? I would advocate for removing that paragraph and keeping everything from "Alternatively, you can make species..." and removing the "Alternatively" but I understand that is how GBIF Norway is providing their datasets. Also step 2 is not how the OBIS community is supplying absences. Maybe this needs to be discussed within TDWG to figure out the most inclusive way to describe how to provide absences?

mike-podolskiy90 commented 2 years ago

Closing this. Please feel free to re-open if anything needs to be fixed in the IPT

albenson-usgs commented 2 years ago

@mike-podolskiy90 does the IPT manual have a separate GitHub? This is still unresolved. I would like the text in the manual to be updated in such a way that it is more inclusive in how to provide absences. Until the text in the manual is modified I would like this issue to remain open somewhere.

albenson-usgs commented 2 years ago

@mike-podolskiy90 @ahahn-gbif who has the ability/authority to make changes to the IPT manual?

mike-podolskiy90 commented 2 years ago

Thank you @albenson-usgs for quick response. I thought it was solved with the documentation. Could you please make a pull request with your suggestions for the documentation, file is here https://github.com/gbif/ipt/blob/master/docs/en/modules/ROOT/pages/sampling-event-data.adoc

albenson-usgs commented 1 year ago

Closed via 1832