VincyaneBadouard / TreeData_broken

Harmonization and correction forest data tool.
https://vincyanebadouard.github.io/TreeData/
0 stars 1 forks source link

WorldFlora is so large it can't even be openned in Excel #85

Closed ValentineHerr closed 1 year ago

ValentineHerr commented 1 year ago

The function WorldFlora::WFO.match used by our botanical correction function is asking for a static copy of the Taxonomic Backbone data from http://www.worldfloraonline.org/downloadData.

That copy is so big it can't be opened in Excel, yet alone in a shiny app. So, I don't know if it is worth having that option....

For now, I tell the user to first open the data in R, trim it to the subset of species relevant to their site, save it as .rds and then upload that file into the app. That should work.

I don't know if this is worth an issue but I wanted to let @gabrielareto and @cpiponiot know.

gabrielareto commented 1 year ago

We cannot store taxonomic backbones locally, we need to interact with them through APIs, which is what taxize and TNRS do.

This is my suggestion to store the taxonomic backbone of a network:

(1) ask the user to pick their taxonomic backbone from the list of backbones that taxize has: https://books.ropensci.org/taxize/data-sources.html

AND

(2) store in their profile the equivalence between [their original names] and [our central names, or some unique global name ID]. This does not require anything from the user.


(1) is important because people from network 2 will have a different set of species, regardless of the backbone they use; a subset of WorldFlora or any backbone will not work in general. (2) is important because, regardless of what the user declares, they may have overwritten some (or many) names. If someone else from network 2 has some of those species, we want to use the original names. There are some risks involved, which are general and unavoidable (profiles should have versions, and be linked to specific versions of a dataset; these are good practices from users that we cannot control).

if we store (1) but not (2), the dataset from network 1 will not recover the original names if translated from format 1 to format 2 and then back to format 1.

ValentineHerr commented 1 year ago

I think what you are saying here is relevant to issue #26 and the potential addition of the taxize package (which needs to be coded.).

This is issue is about WorldFlora package. What I was saying is that if the user wants to use this package (currently implemented in the function), they will need to upload their own backbone into the app. We are not providing it. And I added wording to warn the user that the data that they will need to download from the WorldFlora website is very large and needs to be reduced before they can bring it into the app.

cpiponiot commented 1 year ago

What you propose for Valentine seems reasonable, but it requires extra work from users that I'm not sure many will do, unless perhaps we provide a detailed tutorial on how to do it. But if we provide other options in the BotanicalCorrection() function, such as using the taxize package, then only very motivated users will use WorldFlora and that's probably fine.

gabrielareto commented 1 year ago

For the record, package WorldFlora allows downloading and accessing the backbone from code: WFO.download(save.dir = path.wfo) unzip(paste0(path.wfo, "WFO_Backbone.zip"), exdir = path.wfo) WFO.remember(WFO.file = paste0(path.wfo, "classification.csv")) where "path.wfo" is the path where we want to store the WFO database

gabrielareto commented 1 year ago

Giving multiple options is not practical. The taxize package connects via APIs with many sources, each of which has a different behaviour. The string manipulation is too complicated for the programmer, and connecting to many APIs is very slow for the user.

conceptually, it may be non-sense treating "taxonomic backbone" as a network-level specification. Taxon names are a centralized notion (i.e. there is just one accepted name and the rest of names are not accepted).

I think the best we can do is to try to find the accepted names for any input, regardless of how the user got their names in the first place.

ValentineHerr commented 1 year ago

I think we can close this issue now.