Open fmendezh opened 8 months ago
Below is a set additional webservices and additional parameter options for existing web services. These changes are intended to be additions to the existing v1 API and backwards compatible i.e. no breaking changes although they include additions to the output format, but leaving existing fields and nested structures in place.
The search response includes a classifications
array, which contains 0..n classifications associated with the occurrence record. Example json below (shortened for brevity). The existing gbifClassification
will remain in place with integer keys.
{
"offset": 0,
"limit": 20,
"endOfRecords": false,
"count": 412833,
"results": [
{
"key": 462028,
"datasetKey": "9bd520e3-00fa-4955-a554-924ea440862c",
"publishingOrgKey": "d2b97690-bfd6-11de-b279-d52977ace833",
"installationKey": "99672740-f762-11e1-a439-00145eb45e9a",
"hostingOrganizationKey": "d2b97690-bfd6-11de-b279-d52977ace833",
"publishingCountry": "IE",
"protocol": "DWC_ARCHIVE",
"lastCrawled": "2024-09-05T18:36:01.493+00:00",
"lastParsed": "2024-09-12T14:10:38.809+00:00",
"crawlId": 176,
"extensions": {},
"basisOfRecord": "HUMAN_OBSERVATION",
"occurrenceStatus": "PRESENT",
"sex": "MALE",
"lifeStage": "Adult",
"classifications": [
{
"datasetKey": "7ddf754f-d193-4cc9-b351-99906754a03b",
"usage": {
"key": "8C2QW",
"name": "Episyrphus (Episyrphus) balteatus (De Geer, 1776)",
"rank": "SPECIES"
},
"acceptedUsage": {
"key": "8C2QW",
"name": "Episyrphus (Episyrphus) balteatus (De Geer, 1776)",
"rank": "SPECIES"
},
"classification": [
{
"key": "RT",
"name": "Arthropoda",
"rank": "PHYLUM"
},
{
"key": "CHP6G",
"name": "Hexapoda",
"rank": "SUBPHYLUM"
},
{
"key": "D2P",
"name": "Diptera",
"rank": "ORDER"
},
{
"key": "BXZTG",
"name": "Episyrphus",
"rank": "SUBGENUS"
},
{
"key": "BXZTD",
"name": "Episyrphus",
"rank": "GENUS"
},
{
"key": "B7XFC",
"name": "Syrphini",
"rank": "TRIBE"
},
{
"key": "8C2QW",
"name": "Episyrphus balteatus",
"rank": "SPECIES"
},
{
"key": "N",
"name": "Animalia",
"rank": "KINGDOM"
},
{
"key": "5T6MX",
"name": "Biota",
"rank": "UNRANKED"
},
{
"key": "H6",
"name": "Insecta",
"rank": "CLASS"
},
{
"key": "9H6NG",
"name": "Syrphinae",
"rank": "SUBFAMILY"
},
{
"key": "GVS",
"name": "Syrphidae",
"rank": "FAMILY"
}
]
}
],
"type": "Occurrence"
}
],
"facets": []
}
checklistKey
Searches with the new request parameter checklistKey
will allow users to retrieve records associated with a checklist.
This is possibly only of real use for smaller thematic checklists.
The checklistKey
is the GBIF dataset key for the checklist e.g. 7ddf754f-d193-4cc9-b351-99906754a03b for Catalogue of Life
https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b
This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching. Occurrence records that have been matched to a taxon in the specified checklist will be returned
taxonKey
and checklistKey
Searches with the new request parameter checklistKey
and taxonKey
will allow users to specify the checklist in use,. The following would be a query with a taxon from Catalogue of Life:
https://api.gbif-dev2.org/v1/occurrence/search?taxonKey=CB2MR&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b
The result of this query would be to find records associated with the supplied taxonKey
from the checklist specified by the checklistKey
.
This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching.
scientificName
and checklistKey
Searches with the new request parameter checklistKey
and scientificName
will allow users to specify the taxonomy in use when matching the scientificName
provided.
https://api.gbif-dev2.org/v1/occurrence/search?scientificName=Episyrphus%20(Episyrphus)%20balteatus&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b
This will use name usage matching using the checklist with the specified checklistKey
.
The checklist will resolve the name to a taxonKey
in the checklist, and this will be used for occurrence searching.
The result of this query would be to find records associated with the matched taxonKey
from the checklist specified by the checklistKey
.
This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching.
checklistKey
The ability to facet on checklistKey
with any query to retrieve a list of relevant checklists for a particular search:
https://api.gbif-dev2.org/v1/occurrence/search?facet=checklistKey&limit=0
Will return:
{
"offset": 0,
"limit": 0,
"endOfRecords": false,
"count": 100,
"results": [ ],
"facets": [
{
"field": "CHECKLIST_KEY",
"counts": [
{
"name": "2d59e5db-57ad-41ff-97d6-11f5fb264527",
"count": 100
},
{
"name": "7ddf754f-d193-4cc9-b351-99906754a03b",
"count": 100
},
{
"name": "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
"count": 100
}
]
}
]
}
The UUIDs returned here are datasetKey
values in the GBIF registry.
checklistKey
filterThe facets for taxonKey
and higher rank taxon keys e.g. kingdomKey
, genusKey
will return values based on the GBIF taxonomy by default.
If a checklistKey
is specified, then results will be from that checklist. For example:
https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=2d59e5db-57ad-41ff-97d6-11f5fb264527&facet=familyKey
Returns facets for familyKey
values for WoRMS
{
"offset": 0,
"limit": 0,
"endOfRecords": false,
"count": 497984,
"results": [],
"facets": [{
"field": "FAMILY_KEY",
"counts": [{
"name": "urn:lsid:marinespecies.org:taxname:235102",
"count": 64908
}, {
"name": "urn:lsid:marinespecies.org:taxname:147429",
"count": 23931
}, {
"name": "urn:lsid:marinespecies.org:taxname:196044",
"count": 18861
}, {
"name": "urn:lsid:marinespecies.org:taxname:234449",
"count": 18357
}]
}]
}
Support search by any taxonomic rank. Applications using the web services can retrieve a list of checklists indexed. With a checklist ID, a list rank key field names can be retrieved:
https://api.gbif-dev2.org/v1/occurrence/search/checklist/2d59e5db-57ad-41ff-97d6-11f5fb264527/rankKeys
Rank keys can be used to search occurrences for non major Linnean ranks such as subphylum, suborder:
https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=2d59e5db-57ad-41ff-97d6-11f5fb264527&subphylumKey=urn:lsid:marinespecies.org:taxname:886369
This example is searching subphylum using the WoRMS checklist.
taxonDepth
To aid UI development, particularly taxonomic tree browsing components, and with Catalogue of Life and other taxonomic sources such as WoRMS, we need to support searching for different ranks, we can support for taxonDepth
.
This allows the querying the taxonomic tree information based on a numerical depth within the tree as opposed to specific taxonomic rank (e.g. kingdom).
This URL will return root taxa (regardless of rank) for the specified checklist.
https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b&facet=taxonDepth0&limit=0
This URL will return child taxa (regardless of rank) of the taxon with taxonKey=5T6MX
for the specified checklist.
https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b&facet=taxonDepth1&limit=0&taxonDepth0=5T6MX
Example output
{
"offset": 0,
"limit": 0,
"endOfRecords": false,
"count": 450926,
"results": [ ],
"facets": [
{
"field": "TAXON_DEPTH_1",
"counts": [
{
"name": "P",
"count": 314969
},
{
"name": "N",
"count": 135686
},
{
"name": "c2ce3656-5b6e-46ea-b042-2056011ddb30",
"count": 188
},
{
"name": "B6LM6",
"count": 78
},
{
"name": "F",
"count": 4
},
{
"name": "C",
"count": 1
}
]
}
]
}
With predicate API the EqualsPredicate
and InPredicate
have been extended to include a checklistKey
field allowing the user to specify the checklist that should be used for taxonomic key fields and taxon depth fields.
The predicate API supports searching with multiple taxonomies in a single query.
e.g. users can combine a search with a taxonKey from WoRMS and an taxonKey from Catalogue of Life.
Example with single SPECIES_KEY
{
"predicate": {
"type": "and",
"predicates": [
{
"type": "equals",
"key": "SPECIES_KEY",
"value": "6HQ2Y",
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
}
]
}
}
Example with TAXON_DEPTH_0
{
"predicate": {
"type": "and",
"predicates": [
{
"type": "equals",
"key": "TAXON_DEPTH_0",
"value": "5T6MX",
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
}
]
}
}
Example with multiple SPECIES_KEY
values with taxa from different checklists (WoRMs and CoL in this example):
{
"predicate": {
"type": "or",
"predicates": [
{
"type": "equals",
"key": "SPECIES_KEY",
"value": "5T6MX",
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
},
{
"type": "equals",
"key": "SPECIES_KEY",
"value": "urn:lsid:marinespecies.org:taxname:159142",
"checklistKey": "2d59e5db-57ad-41ff-97d6-11f5fb264527"
}
]
}
}
For testing with curl:
curl --request POST \
--header "Content-Type: application/json" \
--data '{
"predicate": {
"type": "and",
"predicates": [
{
"type": "equals",
"key": "TAXON_DEPTH_0",
"value": "5T6MX",
"checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
}
]
}
}' \
https://api.gbif-dev2.org/v1/occurrence/search/predicate
Example with curl, using WoRMS and multiple species key values from WoRMS:
curl --request POST \
--header "Content-Type: application/json" \
--data '{
"predicate": {
"type": "and",
"predicates": [
{
"type": "in",
"key": "SPECIES_KEY",
"values": [
"urn:lsid:marinespecies.org:taxname:159142",
"urn:lsid:marinespecies.org:taxname:159037"
],
"checklistKey": "2d59e5db-57ad-41ff-97d6-11f5fb264527"
}
]
}
}' \
https://api.gbif-dev2.org/v1/occurrence/search/predicate
This is a large new functionality, so I suppose large changes is expected. Here are the things that surprised me
The current version only allow for one checklist. Which is okay I suppose. It might be unlikely anyone want to use more that one.
I'm not dead aginst this, but it is slightly puzzling because it change the behaviour of what taxonKey refers to. I get 10 results for taxonKey=3
, then I add an additional filter for checklistKey=123
and get more results. And I can only add it once, which is a bit unusual, but not crazy - the same goes for flags, but given that this use keys I expected to be able to add multiple.
Ideas
For species search we have flags that indicate changed beahviour (verbose=true
, strict
, qField=SCIENTIFIC
). We could have something like matchMultipleChecklists=true
which indicate changed behaviour. Once I add that flag, then taxonKey
would match against all checklists. And I can then decide to narrow that by adding one or more checklistKey
.
Another version: checklistTaxonKey=[datasetKey]:[taxonKey]
If I undestand the conversation elsewhere correctly then the predicate approach is
{
type: 'and',
predicates: [
{
type: 'equals',
key: 'checklistKey',
value: '123-123-123'
},
{
type: 'equals',
key: 'taxonKey',
value: '5dX'
}
]
}
And to only allow the checklist predicate once.
This is confusing to me. Again it isn't clear to me how the 2 predicates in the AND influence each other. And secondly it is odd it only can be used once. And lastly unclear what part of the tree it applies to in that case (I imagine a more complex predicate with multiple AND/OR/NOT)
If it is only allowed once, then it isn't a predicate in my mind, but belongs om the same level as the q
param: outside the predicate structure.
Something like {type: equals, key: taxonKey, checklist: '123-123-123', value: 5dX}
is easier to understand and more expressive I would think.
Or {type: equals, key: checklistTaxonKey, checklist: '123-123-123', value: 5dX}
or even a new type like
{type: checklistContext, checklistKey: '123-123-123', predicates: []}
which then specifies the taxon scope for anything beneath.
I've updated the main "Draft proposal" a bit to include predicates. I think where i've landed thus far is:
EqualsPredicate
and InPredicate
have been changed to include an optional checklistKey
, in a similar manner to the matchCase
field - which only makes sense for certain OccurrenceSearchParameters and is ignored if its not relevant for the type. Another option to consider is to use the checklist explicitly as part of the services that support multiple taxonomies, for example:
https://api.gbif-dev2.org/v1/occurrence/search/checklist/{checklistKey}?....
https://api.gbif-dev2.org/v1/occurrence/search/checklist/7ddf754f-d193-4cc9-b351-99906754a03b?....
https://api.gbif-dev2.org/v1/occurrence/search/checklist=7ddf754f-d193-4cc9-b351-99906754a03b?....
Another option.
Draft proposal of changes to the Occurrence API to support: filter by checklist, add additional ranks, full classification, checklists dataset key, data types changes.
Decide if this change can be applied to API v1 or if a V2 is needed.