gbif / checklistbank

GBIF Checklist Bank
Apache License 2.0
31 stars 14 forks source link

How to download matching between 2 checklists? #299

Closed ManonGros closed 12 months ago

ManonGros commented 12 months ago

This is a request we got many times now on the GBIF Helpdesk.

Some Database, system or museum (for example TAXREF or UKSI) would like to have a table with their identifiers matched to the GBIF backbone taxonomy. I think they use it internally and also on their website to refer users to the GBIF backbone.

Because their checklists are so big, they cannot use the API (too many requests). I know there is the diff tool on the checklistbank UI but I haven't figured out if it can work for this purpose.

What would be great is to be able to select two checklists in checklistbank and say give me the a table with the taxonID from one checklist and the corresponding taxonID for the other. That's what we have been sending people so far.

mdoering commented 12 months ago

If the data is in ChecklistBank you can use the matching API there to create such a file. @thomasstjerne correct me if I am wrong, but I don't think you can request such a server side matching with a source checklist through the UI yet?

A curl call to the API from the terminal would look like this:

curl -s --user USER:PASSWORD -X POST "https://api.checklistbank.org/dataset/53147/match/nameusage/job?format=TSV&sourceDatasetKey=28956"

This will match all names from dataset 28956 to the GBIF Backbone (key=53147) and produce a TSV output file. The results can be downloaded when ready, just like other downloads.

mdoering commented 12 months ago

The curl example above produces this file https://download.checklistbank.org/job/9f/9f378581-ac52-4a93-a165-5e209e819c27.zip

mdoering commented 12 months ago

You can use the same matching service also by uploading a CSV/TSV file with names to match instead of picking an existing checklist in CLB.

ManonGros commented 12 months ago

Thanks Markus! This is very helpful, so that's what is documented here? /dataset/{key}/match/nameusage/job https://api.checklistbank.org/#/default/matchTsvJob. I didn't realise! Maybe a tutorial on the Checklistbank API for the GBIF data blog would be helpful?

mdoering commented 12 months ago

That would surely be useful. We never had time to write up sth complete, but I leave notes as I go and answer people questions: https://github.com/CatalogueOfLife/backend/blob/master/API.md