Open MortenHofft opened 3 months ago
If going for cardinality, we might want to discuss with the rest of the team what the api should look like.
Ideas:
just include it in the normal facet response
?facet=type&facetLimit=2
"facets": [
{
"field": "TYPE",
"cardinality": 4, <==== new field that list the number of facets, not just in the response but in total
"counts": [
{
"name": "CHECKLIST",
"count": 53833
},
{
"name": "OCCURRENCE",
"count": 49485
}
]
}
]
other approach use ?facet=something&cardinality=publisherKey&limit=0&offset=0
and then a distinct response for that
{
"count": 1000,
"limit": 0,
"offset": 0,
"results": [],
"facets": ...,
"cardinality": {
"PUBLISHER_KEY": 1234 <==== distinct publisherKeys within the given search filter
}
}
E.g. count number of specimens per kingdom Quick thoughts on the subject. It could probably be nice within collections if we started to have some collection being richly described. But it seems more difficult - both for the API but also to present it in a meaningful and fair way.
facets: [
{kingdomKey: 1, count: 123456} // (from 2 csv rows. one with 123000 individuals and another with 456 individuals)
]
individualCount sum across all those descriptors that have that kingdomKey=1 so you would have to get distinct kingdoms within the filter. And for each sum the individual count of all the matching descriptors.
Presentation wise the UI would probably have to show caveats like this for e.g. a kingdom breakdown:
Thanks Morten! The collection facets and proposed implementation make a lot of sense.
The specimen facets are much more complicated and yes we would have to display a lot of caveats. We would also have to add some other fields to know if the people uploading records have double counted, are exhaustive, etc.
We could add facets to collection search
2 types of metrics would be possible
specimen facets is the only thing that makes sense within a collection. both make sense for institutions and grscicoll generally, but currently there isn't any data for it.
examples of collection facet questions:
how many collections have data in spain how many collections have data about taxon x how many collections have type specimens of taxon x which is the most prevalent preservation types for this collection breakdowns across collections: how many collections per: kingdom, preservation type, country, type specimens, types/country types/kingdom
examples of specimen facet questions:
Which orders does this collection mainly deal with Breakdown of phyla per country for a collection/institution/total breakdowns for all: specimens per: kingdom, preservation type, country, type specimens, types/country types/kingdom
We could start with collection facets?
e.g.
?country=ES&country=FR&facet=kingdomKey
same behaviour as normallyThese collection facets is what I'm guessing would be useful: descriptorCountry, country, kingdomKey, phylumKey, ...other taxonGroupKeys..., typeStatus, preservationType, contentType, personalCollection, instititutionKey, active
Ideally we added something new to the API. Namely cardinality of those facets. So an option to, not only get top 10 orders, but also get the number of unique orders. These makes it easier to do UI. Examples where cardinality is used: https://grscicoll.hp.gbif-staging.org/specimen/search?layout=W1t7ImlkIjoiYm1tNW8iLCJwIjp7fSwidHJhbnNsYXRpb24iOiJkYXNoYm9hcmQuc3RhdGlzdGljcyIsInQiOiJvY2N1cnJlbmNlU3VtbWFyeSJ9XSxbeyJpZCI6IjE4NGhxIiwicCI6eyJ2aWV3IjoiVEFCTEUifSwidHJhbnNsYXRpb24iOiJmaWx0ZXJzLmNvbGxlY3Rpb25LZXkubmFtZSIsInQiOiJjb2xsZWN0aW9uS2V5In1dXQ%3D%3D&view=DASHBOARD
distinct species
,distinct taxa
in statistics chart + number of results in collection chart