CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

Adding identifer namespaces? #1354

Open gdower opened 2 months ago

gdower commented 2 months ago

In the xrelease, I see ZooBank identifiers like this:

local:d8167df7-42af-4dd2-bc49-5aae7f11a500 https://api.checklistbank.org/dataset/301904/nameusage/DV3JP

It seems like it should be prefixed with their lsid namespace instead of local:?

urn:lsid:zoobank.org:act:D8167DF7-42AF-4DD2-BC49-5AAE7F11A500 https://zoobank.org/NomenclaturalActs/d8167df7-42af-4dd2-bc49-5aae7f11a500

As far as I can tell, the local: namespace gets added by the backend:

https://www.checklistbank.org/dataset/2037/verbatim?q=D8167DF7-42AF-4DD2-BC49-5AAE7F11A500

I might also start using COLDP alternativeID for Systema Dipterorum in order to add the ZooBank IDs soon, although Systema Dipterorum is in the process of adding the ZooBank IDs.

mdoering commented 2 months ago

Yes, the local namespace isn't great. It happens during import of ZooBank though, not in the XRelease which just copies them over. Here is the source record: https://api.checklistbank.org/dataset/2037/taxon/d8167df7-42af-4dd2-bc49-5aae7f11a500

Which is based on this verbatim record: https://www.checklistbank.org/dataset/2037/verbatim/242319

For the XRelease I would think removing local identifiers makes sense and making sure that nomenclator identifiers are added with their proper namespace.

mdoering commented 2 months ago

Unless ZooBank shares a different dwca I don't know how to improve this. The scientificNameID is taken as an alternative name id, but there is no scope. I could maybe block it from alt ids in case the name has the exact same identifier like we have here

gdower commented 2 months ago

So the dwca:ID needs their urn:lsid:zoobank.org:act: namespace added onto it or else it gets namespaced as local: by the clb importer? Or should they be putting the namespace on all of their IDs like WoRMS?

https://api.checklistbank.org/dataset/2037/verbatim?q=d8167df7-42af-4dd2-bc49-5aae7f11a500

Is that what the identifier without scope issue means?

Perhaps having a dataset configuration option for ID namespace would be useful (like the dataset option for adding extinct to all values?). Then for alternativeID in other datasets, we'd always need to put the namespace especially if its not the dataset's namespace.

mdoering commented 2 months ago

yes, identifier without scope means that it is just a local id. dwca:ID is not the source of the problem though, it is dwc:scientificNameID or in ColDP it is the coldp:alternativeID fields. The main IDs are expected to be local.