CatalogueOfLife / checklistbank

UI for checklistbank.org
https://www.checklistbank.org/
6 stars 2 forks source link

Question: Identifier mapping #1431

Open sharifX opened 5 days ago

sharifX commented 5 days ago

I was wondering if you can give a short explanation of these different identifiers. For example:

Abax ater (Villers, 1789) coming from Nederlands Soortenregister:

Does the checklistbank maintain a mapping between 145CAE57C83->0AHCYFBQVMRK?

Could this mapping be provided as a list? I am looking for a list of NSR IDs that link to checklist bank IDs.

GBIF species page usage the wikidata page to generate a list. Example JSON query from wikidata: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q1303390&props=claims&format=json

Maybe I can get this via API?

mdoering commented 5 days ago

I am not sure I understand your exact needs, but in CLB we differ between names and name usages, i.e. a taxon or synonym. They have identifiers on their own. The most natural ones are the taxon identifiers.

Any identifiers are used as the source defines them. CLB does not generate new identifiers (except for rare cases when there none in the source), but reuses the original ones.

In the case of the Dutch species register it seems that the identifiers used in their own site and not exported into the data we see, e.g. Abax ater has id=91411 in NSR, but 145CAE57C83 in their DarwinCore export.

So I am afraid the problem is with NSR that they do not supply their actual integer ids. @olafbanki could we ask NSR to change that or even publish ColDP data?

mdoering commented 5 days ago

Let me use an example from ITIS. The ITIS TSN 932346 represents Abax parallelepipedus which can be found with the same ID in ChecklistBank, just scoped under the ITIS dataset key 2144: https://www.checklistbank.org/dataset/2144/taxon/932346 https://api.checklistbank.org/dataset/2144/nameusage/932346

sharifX commented 5 days ago

@mdoering thanks for the response. The ITIS example is helpful. I am trying to see if we can construct a list of all related identifiers in CLB that has NSR id.

I can get a mapping from this wikidata sparql query but only the CLB taxon identifier.

For example, https://www.nederlandsesoorten.nl/linnaeus_ng/app/views/species/nsr_taxon.php?id=120589 links to https://www.catalogueoflife.org/data/taxon/8VVK7 (according to wikidata)

but within the dataset scope we have another ID: https://www.checklistbank.org/dataset/2014/name/xLW8

I will check with NSR to see how they are exporting this.

mdoering commented 5 days ago

Identifiers in CLB with an x prefix are usually generated identifiers, most often because the incoming data had "flat" records with an higher classification given which then needs to be translated into a normalised form with identifiers for each higher taxon. You can identify such records by origin=denormed classification: https://www.checklistbank.org/dataset/2014/taxon/xLW9

In this case the genus Betula was not explicitly existing as a record on its own in NSR, but given in some species records like this one: https://www.checklistbank.org/dataset/2014/taxon/DXQ2YTVB8Y4