First draft of helper functions (& documentation) to access stage 1/2/3 products. Currently only accesses parquet-based versions.
The functions do not collect() by default, but try and offer some helpful messaging by default since creating a connection can be slow. I've also tried to add some advice about using dplyr to filter and summarize data before calling collect() to import data, not sure if that advice will be helpful or either too cryptic or too obvious for beginning/experienced users.
I have more docs for stage 1 than the other two, help flushing out the docs there would be great. I'll try and add unit tests.
stage 1 & 2 default to filtering cycle = "00" since I think users can get especially confused by getting multiple cycles in the same data frame. But other than that one case, I've tried to leave filtering up to the documentation & examples, since I think it's generally most powerful if users can have the remote object and the option to do any dplyr filter/summarise steps directly themselves.
All functions should handle the logistics of setting and restoring the user's env vars gracefully.
Here's an example run with messaging:
library(neon4cast)
weather <- noaa_stage1()
#> establishing connection to stage1 at data.ecoforecast.org ...
#> connected! Use dplyr functions to filter and summarise.
#> Then, use collect() to read result into R
# 5.7M rows of data:
weather |>
dplyr::filter(start_date == "2022-04-01") |>
dplyr::collect()
#> # A tibble: 5,786,316 × 13
#> site_id predicted variable height horizon ensemble start_time
#> <chr> <dbl> <chr> <chr> <dbl> <int> <dttm>
#> 1 LIRO 95463. PRES surface 0 1 2022-04-01 18:00:00
#> 2 LIRO 0.625 TMP 2 m above gr… 0 1 2022-04-01 18:00:00
#> 3 LIRO 76.7 RH 2 m above gr… 0 1 2022-04-01 18:00:00
First draft of helper functions (& documentation) to access stage 1/2/3 products. Currently only accesses parquet-based versions.
The functions do not
collect()
by default, but try and offer some helpful messaging by default since creating a connection can be slow. I've also tried to add some advice about using dplyr to filter and summarize data before callingcollect()
to import data, not sure if that advice will be helpful or either too cryptic or too obvious for beginning/experienced users.I have more docs for stage 1 than the other two, help flushing out the docs there would be great. I'll try and add unit tests.
stage 1 & 2 default to filtering
cycle = "00"
since I think users can get especially confused by getting multiple cycles in the same data frame. But other than that one case, I've tried to leave filtering up to the documentation & examples, since I think it's generally most powerful if users can have the remote object and the option to do any dplyr filter/summarise steps directly themselves.All functions should handle the logistics of setting and restoring the user's env vars gracefully.
Here's an example run with messaging: