CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

Investigate if ChecklistBank could expose the readonly GBIF v1 Species API #1320

Open mdoering opened 4 months ago

mdoering commented 4 months ago

What are the differences between the v1 GBIF Species API and ChecklistBanks data model. Are there any true blockers?

mdoering commented 4 months ago

Name parser

mdoering commented 4 months ago

Searching names

Species suggest and search both have similar parameters and return types. The exact behavior of the search (scoring/ranking) is likely to be different:

Return type

mdoering commented 4 months ago

Species response Type: see above. Additionally:

mdoering commented 4 months ago

v1 methods which do not exist at all:

mdoering commented 4 months ago

Identifiers are the biggest problem. ChecklistBank has compound keys with datasetKey (int) and a dataset scoped id (String) which is the original identifier from the source. While v1 has a single int key which is unique across all datasets.

COL stable identifiers are short string, but can be converted bidirectionally into an int. That won't work for other dataset identifiers

MattBlissett commented 4 months ago

Backbone taxon keys are used in other GBIF APIs:

I can't think of an exposure of non-backbone keys, things like the IUCN Red List resolution during interpretation don't store the keys.

mdoering commented 4 months ago

Does that mean we cannot change the keys to not break the other APIs or is it a matter of (not) changing the data type from int to string? If the APIs would accept both an old backbone integer and a new string one we might be able to offer a smooth transition. Old integers would be mapped internally to the new ids which could also be submitted directly then.

Note also that there are 17 accepted kingdoms in COL these days, mostly viruses.