gbif / ingestion-management

Tracking of data issues seen during data ingestion processes
Apache License 2.0
1 stars 0 forks source link

Identifiers validation failed for dataset Fauna presente en los monitoreos de las fincas de Agropecuaria Aliar S.A. #1413

Closed gbif-pipelines closed 3 months ago

gbif-pipelines commented 3 months ago

Identifier validation failed for the dataset Fauna presente en los monitoreos de las fincas de Agropecuaria Aliar S.A.:

New IDs sample:

LaFazenda:Fauna:0140
LaFazenda:Fauna:0262
LaFazenda:Fauna:0141
LaFazenda:Fauna:0263
LaFazenda:Fauna:0261
LaFazenda:Fauna:0268
LaFazenda:Fauna:0147
LaFazenda:Fauna:0266
LaFazenda:Fauna:0145
LaFazenda:Fauna:0267
Old IDs sample:

LaFazenda:Fauna:0255
LaFazenda:Fauna:0256
LaFazenda:Fauna:0257
LaFazenda:Fauna:0258
LaFazenda:Fauna:0259
LaFazenda:Fauna:0260
LaFazenda:Fauna:0261
LaFazenda:Fauna:0262
LaFazenda:Fauna:0263
LaFazenda:Fauna:0264
Publisher email Hello, I am contacting you from the GBIF Secretariat about a dataset published by the [Fauna presente en los monitoreos de las fincas de Agropecuaria Aliar S.A.](https://registry.gbif.org/dataset/bc1e3aec-bfbc-4bed-b912-ae7a26c0dc19) : https://doi.org/10.15472/ohlqdy. We noticed that the occurrenceIDs were changed. We have temporarily paused the ingestions of this dataset. As you might already know, when an occurrence record has a new occurrenceID for a given dataset, our system considers it to be a new occurrence. This means that it will be given a new gbifid and a new occurrence URL (like this one: https://www.gbif.org/occurrence/1252968762) and the old gbifid and URL will be deprecated. In this case, this means that the occurrence URLs would be deprecated when ingesting the newest versions of these datasets. We would like to check with you if those changes were intentional. Do you know if this is the case? Please let us know, thanks! We are happy to resume the dataset ingestion. Note that some users rely on those occurrence URLs and gbifids (like https://bionomia.net for example). In an attempt to improve the stability of the occurrence URLs and gbifids, we have implemented a warning system to detect these type of changes in datasets (see this news item). If the data publisher can provide us with a list of old and new occurrenceIDs per record, we can avoid the identifier and URL changes. Could that be an option? Please let us know if you have any question. Thanks! All the best,

You can skip/fix identifier validation using the registry UI.

EstebanMH-SiB commented 3 months ago

We can proceed indexing this dataset, we just added new records and it is flagged because there is more than 50% increase in the number of occurrences