AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Datasets published on GBIF from IPT have their endpoint replaced #1021

Open ManonGros opened 5 months ago

ManonGros commented 5 months ago

We recently noticed that most datasets published on GBiF by the Australian Antarctic Data Centre via their IPT (https://data.aad.gov.au/ipt/) have ALA endpoint. See for example this dataset: https://www.gbif.org/dataset/85ff6f64-f762-11e1-a439-00145eb45e9a which was originally registered and published on GBIF via an IPT (https://data.aad.gov.au/ipt/resource?r=em_marine). It still shows on GBIF as "hosted by the AADC IPT". The endpoint (and archive) now comes from the ALA: https://dwca-exports.ala.org.au/dr137.zip. As far as I can see the endpoint was replaced 7 months ago by the dmartin user (which I believe is the one associated with the ALA portal).

I don't see any comment mentioning that this was intentional.

I think it is confusing to have the datasets associated with one installation (IPT) but the data coming from another (ALA). Would it be possible to investigate? Thanks!

This is the list of all the datasets published by the AADC, most of them have ALA endpoints: dataset_aad_GBIF.csv

sadeghim commented 5 months ago

Hi @djtfmartin, This is a bit confusing for me. I checked some of the DRs and seems a bit odd to have them updated on GBIF through ALA. For example https://collections.ala.org.au/dataResource/show/dr129 has following info for GBIF Sync: image

It shows that it shouldn't be shared with GBIF but they have it on their side: https://www.gbif.org/dataset/c60d0b4f-4e51-4a5a-95dc-babc74ba3db0 I was wondering where the logic for checking this is. Is it in pipelines or DAGs? And how can we address this problem?

Thanks

peggynewman commented 5 months ago

So, the AADC's IPT is registered directly with GBIF. They used to send their data via us, now they don't. In some cases, like some of the OBIS datasets, we line it up such that we have the GBIF registry key and DOI details on hand, but we do not share the dataset. If the metadata was recently updated, then that is a concern because they should all be set to not be shared to GBIF, from OBIS or AADC.

ManonGros commented 4 months ago

I don't know if this related but my colleague @mike-podolskiy90 also noticed something odd about the DOIs associated with the datasets concerned:

This is a list of all the datasets concerned by this duplication:

82e218a4-f762-11e1-a439-00145eb45e9a
7d4b8c41-d875-442f-9647-372a8b9cfb38
95e15b18-f762-11e1-a439-00145eb45e9a
0cfaaef6-f3c4-45fc-9a9d-246545da4150
892c7dee-f762-11e1-a439-00145eb45e9a
8601bda0-f762-11e1-a439-00145eb45e9a

Do you think this could be related? Or could it be an issue on our side? (GBIF), this is the first time I see it.