CatalogueOfLife / portal

The public facing website and dynamic portal for the CoL
https://www.catalogueoflife.org
4 stars 1 forks source link

Show species counts in tree #23

Open mdoering opened 4 years ago

mdoering commented 4 years ago

The old UI shows the number of species included for every (higher) taxon in the tree:

Screenshot 2020-09-03 at 13 55 38

It also shows the number of extinct and estimated species. The new portal only provides the estimates which are offered by the Tree API:

Screenshot 2020-09-03 at 14 00 45

Should we include species counts for both living and extinct in the Tree API? Or would some other (additional?) counts be more informative, e.g. by accepted taxa for each major ranks? Similar to what we now have in the taxon details view? https://data.catalogue.life/dataset/3/taxon/bfb709db-491c-48b6-80f5-32ef14f63e4f

Screenshot 2020-09-03 at 14 05 15
dhobern commented 4 years ago

I would probably not try to include counts for the intermediate ranks. The count of 31 subfamilies and two tribes for Tracheophyta is pretty meaningless since there are 510 families - the fact that some of these families have been subdivided but most haven't is confusing.

I guess the main value of the counts is to indicate the scale of the underlying data, so some kind of accepted species count seems adequate.

mdoering commented 4 years ago

Yes, to me it also is mostly an indicator of the size of the underlying subtree. I am fine with just species counts, but would additionally consider to also do descendant counts, i.e. the number of taxa across all ranks including infraspecific taxa. Or even all usages including synonyms. That's what we used in ChecklistBank to give an idea of the size of the subtree. Descendants also work if there are no species involved, e.g. we have various parts of the tree where we end with genera. For those species counts would give zero and you lose the ability to jugde their size. With descendants it's different, but its much harder to compare across groups if the treated ranks are vastly different.

I think I would prepare the backend to track species and descendant counts, so the UI can be adjusted as needed. Well, or maybe track species, genus & family counts?

mdoering commented 3 years ago

@thomasstjerne Now that we have a varnish cache in front of the API for releases we could also consider to take the counts from ElasticSearch for each node and decorate the tree clientside. This is of course never great as it can result in lots of calls, but it might be sth we can do quickly as a start and replace later on if its a drag. Caching results at least should protect us from seriously slow pages.

E.g. for Animalia: https://api.catalogue.life/dataset/3LR/tree/061950e4-9782-4d1a-9c87-dcf375788e6b/children

The ES count for accepted Animalia species would be: https://api.catalogue.life/dataset/3LR/nameusage/search?taxonID=061950e4-9782-4d1a-9c87-dcf375788e6b&rank=species&status=accepted&status=provisionally_accepted&limit=0

... going down from 200ms to 50ms here.

1.338.139 is not far off the 1.296 thousand in the 2019 release

thomasstjerne commented 3 years ago

@thomasstjerne Now that we have a varnish cache in front of the API for releases we could also consider to take the counts from ElasticSearch for each node and decorate the tree clientside. This is of course never great as it can result in lots of calls, but it might be sth we can do quickly as a start and replace later on if its a drag. Caching results at least should protect us from seriously slow pages.

I am not keen on introducing temporary solutions in the UI for this. Wouldn´t it be possible to decorate the tree response with data from elastic on the fly in the backend? I mean, like the frontend could do if it had its own backend (like the GBIF portal).

Then it would be transparent for the frontend and the data could be replaced by sth generated at release time. And it would save a large number of requests from the UI, which would be advantageous for users not located close to GBIF servers.

mdoering commented 3 years ago

I was thinking about that too. Doable sure, but it would make the response a lot slower. Decorating it clientside would show the tree and then fetch counts and could render them as they come in. That way more responsive. If its via the backend I am in favor of preprocessing it for releases and external datasets

mdoering commented 3 years ago

I have added optional taxon counts to the tree API that are added to the response when a countBy=SPECIES query parameter is present. Instead of SPECIES any rank can be given. It is not very performant to do ES queries for every node in the response, so please do not yet use this on the public portal pages. We will need a precalculated version for that, not on the fly queries to ES.