Open mdoering opened 4 months ago
Species suggest and search both have similar parameters and return types. The exact behavior of the search (scoring/ranking) is likely to be different:
datasetKey
UUID needs to be mapped to CLBs int datasetKeyconstituentKey
UUID needs to be mapped to CLBs int datasetKey -> sourceDatasetKeyrank
OKhigherTaxonKey
OK, but will be a stringstatus
OKextinct
OKhabitat
OK = environmentthreat
: MISSING ! but could be addednameType
OKnomenclaturalStatus
: very different vocabulary is being used in CLB. I would think this is a very niche parameter that would not be a blockerorigin
: OK, but slightly different vocab values. Not all can be mappedissue
OK, but rather different vocab values. Not all can be mappedhl
: highlighting is not yet supported in CLB (and troublesome to implement)limit
/offset
: OKfacet
: OK (might be some other facet names we could map - and available facets also differ)facetMincount
: NOT SUPPORTEDfacetMultiselect
: NOT SUPPORTEDfacetLimit
: NOT SUPPORTEDfacetOffset
: NOT SUPPORTEDReturn type
no Linnean ranks
, but could be added and is desireable as users have already requested it: https://github.com/CatalogueOfLife/backend/issues/1122numDescendants
: NOT SUPPORTED, but could be for immutable datasetsnumOccurrences
: NOT SUPPORTED, I wonder if that is even still in use in GBIF? We could add this by calling the GBIF API to retrieve countsdescriptions
: NOT SUPPORTED, but there is a generic TaxonProperty extension that maybe could be used instead. Or a new extension being added which isn't such a big thing.vernacularNames
: all OK, but some properties are missing and would need to be added:
lifeStage
plural
Species response Type: see above. Additionally:
deleted
: CLB releases are immutable and the way deleted identifiers work is different. We can resolve older, now deleted IDs, but to search & work across them all is difficult and maybe not possiblelastCrawled
: OK (but can also be uploads)lastInterpreted
: OK, but really always the same as crawledv1 methods which do not exist at all:
/species/{usageKey}/toc
/species/{usageKey}/speciesProfiles
we only keep a few infos directly on the taxon as most these infos are 1:1 and make no sense in an extension. DwC forced us that way. E.g. extinct, environment, livingPeriod exist, but lifeForm, habitat, ageInDays, sizeInMillimeter, massInGram do not exist and would have to be TaxonProperty records. Doable, but quite some mapping effort going on/species/{usageKey}/metrics
: not existing at all. Would need to be precalculated and stored similar to the flat classificationIdentifiers are the biggest problem. ChecklistBank has compound keys with datasetKey
(int) and a dataset scoped id
(String) which is the original identifier from the source. While v1 has a single int key which is unique across all datasets.
COL stable identifiers are short string, but can be converted bidirectionally into an int. That won't work for other dataset identifiers
Backbone taxon keys are used in other GBIF APIs:
I can't think of an exposure of non-backbone keys, things like the IUCN Red List resolution during interpretation don't store the keys.
Does that mean we cannot change the keys to not break the other APIs or is it a matter of (not) changing the data type from int to string? If the APIs would accept both an old backbone integer and a new string one we might be able to offer a smooth transition. Old integers would be mapped internally to the new ids which could also be submitted directly then.
Note also that there are 17 accepted kingdoms in COL these days, mostly viruses.
What are the differences between the v1 GBIF Species API and ChecklistBanks data model. Are there any true blockers?