ABbiodiversity / wildRtrax

wildrtrax is an R package for environmental sensor data management and analytics
https://abbiodiversity.github.io/wildRtrax/
Other
8 stars 9 forks source link

wt_tidy_species usage is confusing, have to download wt_get_species twice #21

Closed see24 closed 5 months ago

see24 commented 12 months ago

In the data wrangling vignette it recommends to first do an inner join between your data and the species table downloaded with wt_get_species and then to do wt_tidy_species to remove species that are not of interest. In fact this is required because if you don't have a species_class column in the data passed to wt_tidy_species you get an error.

Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `species_class` doesn't exist.

But then when you run wt_tidy_species it runs wt_get_species internally and re-downloads the same table. This isn't a big deal because it doesn't take long, but seems unnecessary.

It looks like the error above is just because if zerofill = TRUE there is a line to remove the species_class column which gives an error if it is not present. If that select statement was changed to select the desired columns or use select(-any_of(c("species_code", "species_class", ...) then there would be no need to download the species table and join it before running wt_tidy_species which perhaps is the intended process. If that's the case you could update the vignette since the join to the species table is not really necessary.

see24 commented 12 months ago

It seems like the zerofill = TRUE option is not working very well in general. It is not filtering out anything for me. I think it is because there are columns in my data (like scientific_name) that are not removed in the select statement before unique so they are all being kept and then added back. It might work better to do dplyr::distinct on the columns that should uniquely id a visit.