gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

71k incertae sedis from Manchester #4213

Open gbif-portal opened 2 years ago

gbif-portal commented 2 years ago

71k incertae sedis from Manchester

There's got to be something wrong with this mapping—less than half the records from the Manchester Museum match taxa.

Maybe one for @sophiathirza ?


Github user: @kcopas User: See in registry - Send email System: Firefox 103.0.0 / Mac OS X 10.15.0 Referer: https://www.gbif.org/occurrence/search?dataset_key=4f64e2fc-a84b-49f6-802f-e48f725717d7&taxon_key=0 Window size: width 1544 - height 947 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: OPERATIONAL

kcopas commented 2 years ago

Small in comparison, but the same collection has a Richard S. Spruce listed as the recorder. I presume this is the legendary English explorer, for whom I've never seen a middle initial ;-)

sophiathirza commented 2 years ago

Thanks for letting me know about the collection on GBIF.

The Manchester Museum collection is hosted by Vertnet: https://www.gbif.org/publisher/b472a35a-6461-444a-a3d6-84e97e6636fe

kcopas commented 2 years ago

Ah, should’ve checked that detail! Now paging @dbloom …

dbloom commented 2 years ago

Can you give me something more specific here? Maybe an example or two so I know what I'm looking for or what I need to ask of MANCH.

kcopas commented 2 years ago

Sorry, Dave—link and description in the issue.

only 47% of Manchester records match the backbone. 71k are incertae sedis. Gotta be something up, right?

dbloom commented 2 years ago

Noted. It has been almost 5 years since they updated. I will seek an update from them. Short of that, I will review the processing on my end for what we have. Lots has changed during that time.

Pokes appreciated if time slips by.

albenson-usgs commented 2 years ago

Lurking here. I wish you could add occurrenceID to the table so you could grab them easily to check them in the IPT published version without having to download this subset to get them.

MortenHofft commented 2 years ago

@albenson-usgs I wish you could add occurrenceID to the table

I'm not sure if this is what you meant, but I've added occurrenceID to the table on occurrence search

Screenshot 2022-08-15 at 09 56 04

Obviously it was always possible to go to the individual record to see it (without downloading)

albenson-usgs commented 2 years ago

🤦 Thanks @MortenHofft. I did look there. I just didn't see it for some reason. But now I know and can use this in the future. All- apologies for the side convo on this thread.

MattBlissett commented 2 years ago

No need for :facepalm:, Morten added the Occurrence ID column to that list this morning.

albenson-usgs commented 2 years ago

Now to be of actual help possibly. @dbloom I downloaded the DWCA from the Vertnet IPT and for most of those 71K scientificName is missing. Capture

dbloom commented 2 years ago

Yes, thank you @albenson-usgs, et al. That was a very complicated dataset to deal with and we were aware of the limitations of the data when we published it, but we did so at the request of MANCH - they had other reasons to publish those data in this incomplete form (it's been so long I just don't remember what those reasons were). Most of the incertae sedis records were that way from the start, so we are not dealing with errors during data process, mapping issues with the IPT or any back end issues at GBIF. The truth is for most of those records there was little or nothing to match to the Backbone from the start.

I have been in communication with John Peel at MANCH - he wrote to me this morning. They have our data quality and completeness reports from the last time we processed their data and have been working through it slowly. John is going to provide a revised dataset to me at some point in the near-ish future, but he has admitted that for the curators the combination Covid, on-going updates to their EMu installation and being "up to their neck with work on exhibitions as we are approaching the reopening of the Museum (it has been closed while a substantial capital project to build a new entrance and gallery space)" might mean that an updated dataset will take some time to produce.

I will keep this issue updated as I receive data from MANCH.