gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

inaturalist duplicates #2874

Open gbif-portal opened 4 years ago

gbif-portal commented 4 years ago

inaturalist duplicates

https://www.gbif.org/occurrence/2621921271 is a duplicate of https://www.gbif.org/occurrence/2460019191

one is from inat, the other is from a localized inat version


Github user: @MortenHofft User: See in registry System: Chrome 83.0.4103 / Mac OS X 10.14.6 Referer: https://www.gbif.org/occurrence/2621921271 Window size: width 1440 - height 766 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: OPERATIONAL datasetKey: 2bf7b2c0-34dc-4458-8ee4-4f99fac03b33 publishingOrgKey: e3aa237e-53c8-4c46-93ef-77a18bb9c80e

MattBlissett commented 4 years ago

Probably everything published by https://www.gbif.org/dataset/2bf7b2c0-34dc-4458-8ee4-4f99fac03b33 is a duplicate in iNaturalist.

@ahahn-gbif, was there any discussion about this dataset before it was published? I don't see anything since some March 2019 emails.

ahahn-gbif commented 4 years ago

I am not aware of any prior discussion, the publisher just registered and got endorsed. It has earlier echoes of e.g. fish datasets from a museum also being included in Fishbase which was also included in OBIS - we cannot always prevent these duplications, though I would hope that with clustering, we could eventually shield users from unknowingly including duplicates where it would matter (as in modeling applications). We'll get in touch with BioDiversity4All and iNaturalist regardless.

timhirsch commented 4 years ago

This is something that merits a broader discussion as we may find that other iNaturalist national affiliates from their growing network will register separately as GBIF data publishers - we may not want to discourage that as it helps preserve the credit and attribution to the national effort, but on the other hand we don't want to build in new routes of duplication. It has come up in separate email discussion about 'slicing' of data e.g. from Russian iNat data for the same reason. I would suggest an internal meeting to decide on recommended practice before contacting iNat and the Portuguese publisher.

kcopas commented 4 years ago

I believe this is more straightforward, actually. I hope I’m remembering correctly, but Rui had been speaking with BIodiversity4All for some time. I think they registered and started publishing and then shifted to iNat. Rui could probably set this right more promptly than any of us.

Sent with GitHawk