Open mdoering opened 4 years ago
I would probably not try to include counts for the intermediate ranks. The count of 31 subfamilies and two tribes for Tracheophyta is pretty meaningless since there are 510 families - the fact that some of these families have been subdivided but most haven't is confusing.
I guess the main value of the counts is to indicate the scale of the underlying data, so some kind of accepted species count seems adequate.
Yes, to me it also is mostly an indicator of the size of the underlying subtree. I am fine with just species counts, but would additionally consider to also do descendant counts, i.e. the number of taxa across all ranks including infraspecific taxa. Or even all usages including synonyms. That's what we used in ChecklistBank to give an idea of the size of the subtree. Descendants also work if there are no species involved, e.g. we have various parts of the tree where we end with genera. For those species counts would give zero and you lose the ability to jugde their size. With descendants it's different, but its much harder to compare across groups if the treated ranks are vastly different.
I think I would prepare the backend to track species and descendant counts, so the UI can be adjusted as needed. Well, or maybe track species, genus & family counts?
@thomasstjerne Now that we have a varnish cache in front of the API for releases we could also consider to take the counts from ElasticSearch for each node and decorate the tree clientside. This is of course never great as it can result in lots of calls, but it might be sth we can do quickly as a start and replace later on if its a drag. Caching results at least should protect us from seriously slow pages.
E.g. for Animalia: https://api.catalogue.life/dataset/3LR/tree/061950e4-9782-4d1a-9c87-dcf375788e6b/children
The ES count for accepted Animalia species would be: https://api.catalogue.life/dataset/3LR/nameusage/search?taxonID=061950e4-9782-4d1a-9c87-dcf375788e6b&rank=species&status=accepted&status=provisionally_accepted&limit=0
... going down from 200ms to 50ms here.
1.338.139 is not far off the 1.296 thousand in the 2019 release
@thomasstjerne Now that we have a varnish cache in front of the API for releases we could also consider to take the counts from ElasticSearch for each node and decorate the tree clientside. This is of course never great as it can result in lots of calls, but it might be sth we can do quickly as a start and replace later on if its a drag. Caching results at least should protect us from seriously slow pages.
I am not keen on introducing temporary solutions in the UI for this. Wouldn´t it be possible to decorate the tree response with data from elastic on the fly in the backend? I mean, like the frontend could do if it had its own backend (like the GBIF portal).
Then it would be transparent for the frontend and the data could be replaced by sth generated at release time. And it would save a large number of requests from the UI, which would be advantageous for users not located close to GBIF servers.
I was thinking about that too. Doable sure, but it would make the response a lot slower. Decorating it clientside would show the tree and then fetch counts and could render them as they come in. That way more responsive. If its via the backend I am in favor of preprocessing it for releases and external datasets
I have added optional taxon counts to the tree API that are added to the response when a countBy=SPECIES
query parameter is present. Instead of SPECIES any rank can be given. It is not very performant to do ES queries for every node in the response, so please do not yet use this on the public portal pages. We will need a precalculated version for that, not on the fly queries to ES.
The old UI shows the number of species included for every (higher) taxon in the tree:
It also shows the number of extinct and estimated species. The new portal only provides the estimates which are offered by the Tree API:
Should we include species counts for both living and extinct in the Tree API? Or would some other (additional?) counts be more informative, e.g. by accepted taxa for each major ranks? Similar to what we now have in the taxon details view? https://data.catalogue.life/dataset/3/taxon/bfb709db-491c-48b6-80f5-32ef14f63e4f