bcgov / rems

An R package to access data from British Columbia's Environmental Monitoring System
Apache License 2.0
19 stars 5 forks source link

add dont_get arg to get_ems_data #35

Closed sebdalgarno closed 5 years ago

sebdalgarno commented 5 years ago

Apologies for multiple commits. Let me know if you'd like me to try again wth a different approach.

We'd like to modify two functions:

  1. get_ems_data() add dont_get argument (default FALSE returns data.frame, TRUE returns NULL) to simply check whether data is cached or whether is update is available, but without returning the cached data. Default behaviour remains the same.

  2. filter_historic_data() add check_exists argument (default TRUE). setting check_exists = FALSE skips checking whether data exists/update available before filtering data. Default behaviour remains the same.

These modifications will be useful for the rems shiny app we are developing.

ateucher commented 5 years ago

For item 1 (get_ems_data()), I think I'd prefer the argument name to be something like check_only or even better, factor out the checking code into its own function (check_cached_data()) which could be exported, as well as used internally by get_ems_data(). What do you think?

For item 2 (I think you mean read_historic_data()), do you only want to avoid checking the currency of the data, or do you also want to avoid checking to see if it exists at all? If the latter, I'm curious as to why you would want this behaviour.

sebdalgarno commented 5 years ago

item 1: we'd prefer to go for just the argument name change (check_only sounds good) for now. item 2: You're right it's called read_historic_data()! Checking and filtering of the database occur in two separate places in the app. Database update/download occurs with the download_historical_data before the app launches. Then within the app, the filtering occurs, although there is then no need to go back and check for the db. I would prefer not to run the same checks twice.

ateucher commented 5 years ago

item 1: Ok, I can live with that. Please change the argument name to check_only

item 2: I can understand that for your use-case, however for other users it could trip them up. I can see why they may want to skip checking the currency of the data (e.g., in non-interactive use), but if they skip checking for the existence of the data and it doesn't exist, then the function will fail in an unexpected way. Also, checking for the existence of the data is trivial in terms of time. So if you can change the name of the argument to check_db and make it only skip checking the currency, that would be great.

sebdalgarno commented 5 years ago

for item 2: You are absolutely right - It makes most sense to just skip checking currency. I've implemented the requested changes.

ateucher commented 5 years ago

This is nice, thanks @sebdalgarno. One comment inline above, and I also think it would be nice if get_ems_data(... check_only = TRUE) returned TRUE (invisibly) instead of NULL.

ateucher commented 5 years ago

Hi @sebdalgarno one final question - is it important for get_ems_data() to check for updates when check_only = TRUE or is it only important that it it checks that the data is cached? If the former, I think we can check and exit the function much earlier...

sebdalgarno commented 5 years ago

Hi @ateucher - for our purposes it is useful for check_only = TRUE to check for updates. We use get_ems_data(check_only = TRUE) within a shinyrems::run_app() function to check for data existence/currency prior to launching the app. The user is prompted in the console to update the data if an update is available. Once the app is launched successfully, we then get the data into a reactive expression. You can check out the shinyrems repository (and take the app for a spin if you install our version of rems) here: https://github.com/poissonconsulting/shinyrems

sebdalgarno commented 5 years ago

I'll add that within the app we make use of some internal rems functions to retrieve the data (and avoid requiring access to internet to check for updates):

ems_data <- function(){
  ret <- rems:::._remsCache_$get("2yr")[, rems:::wq_cols()]
  rems:::add_rems_type(ret, "2yr")
}
ateucher commented 5 years ago

Ok, that makes sense, thanks @sebdalgarno.

So ideally you would be able to do that with get_ems_data(ask = FALSE, dont_update = TRUE), but currently it checks for an update anyway... is that right?

sebdalgarno commented 5 years ago

yes that's correct. This line is still run file_meta <- get_file_metadata(which), in order to tell the user that there is indeed a newer version but they have chosen not to update.

ateucher commented 5 years ago

Yeah, so maybe that's a very good reason to change that behaviour. If someone explicitly doesn't want to update, there's not a good reason to check. I'll merge this, then make that change and push it up to master. I'm not super worried about you using non-exported functions, but it's better to not.

sebdalgarno commented 5 years ago

yes I agree would be better not to have to use internal functions. That's great! thanks @ateucher

ateucher commented 5 years ago

Ok, done and pushed to master, released v0.4.1. Please let me know if that successfully allows you to get rid of your ems_data() function in favour of get_ems_data(dont_update = TRUE)

sebdalgarno commented 5 years ago

yes I've tested with the new version and was able to replace my ems_data() function. Thanks again!

ateucher commented 5 years ago

That was fast! Great, thanks.