IMOS-AnimalTracking / remora

Rapid Extraction of Marine Observations for Roving Animals
https://imos-animaltracking.github.io/remora
GNU General Public License v3.0
13 stars 5 forks source link

extractEnv problem #26

Closed ecologistpablo closed 1 year ago

ecologistpablo commented 1 year ago

Describe the bug The function extractEnv isn't downloading certain variables from the IMOS ATF, such as current, turbidity, chl-a, npp.

To Reproduce I'm an honours student of Ross Dwyer & Kylie Scales at the University of the Sunshine Coast attempting to use REMORA to link Grey Nurse Shark movement with environmental variables.

I used to be able to pull rs_current data only using REMORA and no other variables, recently this has changed and a new error code is being thrown up in my console. I am looking to download current, npp, chl, turbidity and others using this function and have been trying for a few months with no resolve.

What works:

What doesn't work:

I've used the sample data included in the package to ensure my data frame (downloaded from IMOS) wasn't the reason the function isn't working, but I'm getting the same error code with the TownsvilleReefQC dataframe when attempting to download rs_current data:

library(remora) library(tidyverse) data("TownsvilleReefQC")

simplify & subset data for speed

qc_data <-

  • TownsvilleReefQC %>%
  • unnest(cols = c(QC)) %>%
  • ungroup() %>%
  • filter(Detection_QC %in% c(1,2)) %>%
  • filter(filename == unique(filename)[1]) %>%
  • slice(1:20)

Extract daily interpolated sea surface temperature

cache_layers & fill_gaps args set to FALSE for speed

qc_data1 <-

  • extractEnv(df = qc_data,
  • X = "receiver_deployment_longitude",
  • Y = "receiver_deployment_latitude",
  • datetime = "detection_datetime",
  • env_var = "rs_current",
  • cache_layers = FALSE,
  • crop_layers = TRUE,
  • full_timeperiod = FALSE,
  • fill_gaps = T,
  • folder_name = "test",
  • .parallel = FALSE) Extracting environmental data only on days detections were present; between 2013-08-10 and 2013-09-15 (11 days) This may take a little while... Accessing and downloading IMOS environmental variable: rs_current Finding IMOS Ocean Current data... Error in map(.x, .f, ...) : ℹ In index: 1. ℹ With name: 2013. Caused by error in open.connection(): ! HTTP error 404. Extracting and appending environmental data Filling gaps in environmental data by extracting median values from a 20km buffer around detections that fall on 'NA' values Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'extract': object 'env_stack' not found

Expected behavior Would love the function to download all data I require to perform my analysis!

Screenshots image

Desktop (please complete the following information):

Additional context Would love any and all help

ecologistpablo commented 1 year ago

@ianjonsen @vinayudyawer

ecologistpablo commented 1 year ago

Here's my other error code that comes up when using Grey Nurse Shark data:

image

ianjonsen commented 1 year ago

Are you using {remora} v 0.7.1 - installed via remotes::install_github('IMOS-AnimalTracking/remora')?

ecologistpablo commented 1 year ago

Are you using {remora} v 0.7.1 - installed via remotes::install_github('IMOS-AnimalTracking/remora')?

I am yes

ianjonsen commented 1 year ago

As there have been no updates to {remora}, my first thoughts (without checking) are that it's either an issue on the IMOS data server or some dependent packages have been updated and are now throwing errors. I will make time to dig into this as soon as I can.

ecologistpablo commented 1 year ago

Thank you! I had a dig into the back end of it and saw that the variables that work have different URL functions within .build_urls, so maybe the URL into IMOS has changed for some variables?

ianjonsen commented 1 year ago

Yes, the current data directory name on the IMOS thredds server has changed subtly. I've updated the code and confirmed that your example using the TownsvilleReefQC data now works:

qc_data1 <- 
    qc_data %>% 
    extractEnv(X = "receiver_deployment_longitude", 
               Y = "receiver_deployment_latitude", 
               datetime = "detection_datetime", 
               env_var = "rs_npp", 
               cache_layers = TRUE,
               crop_layers = TRUE,
               full_timeperiod = FALSE,
               fill_gaps = TRUE,
               folder_name = "test",
               .parallel = FALSE)

The variables rs_chl, rs_turbidity, rs_npp work for me without any update to the code - their URLS have not changed.

Update now pushed through to master branch. Thanks for notifying with reproducible example 👍

Reply here if any issues persist, otherwise I will close this issue after 2 weeks. Thanks

ecologistpablo commented 1 year ago

Thanks for your support in this conundrum, I really appreciate it.

I updated remora to v 0.7.3 and tried rs_current again using the example data "TownsvilleReefQC", and a new error code relating to sapply and netCDFs is being produced now:

data("TownsvilleReefQC") qc_data <-

  • TownsvilleReefQC %>%
  • unnest(cols = c(QC)) %>%
  • ungroup() %>%
  • filter(Detection_QC %in% c(1,2)) %>%
  • filter(filename == unique(filename)[1]) %>%
  • slice(1:20) qc_data_1 <-
  • extractEnv(df = qc_data,
  • X = "receiver_deployment_longitude",
  • Y = "receiver_deployment_latitude",
  • datetime = "detection_datetime",
  • env_var = "rs_current",
  • cache_layers = FALSE,
  • crop_layers = TRUE,
  • full_timeperiod = TRUE,
  • fill_gaps = TRUE,
  • folder_name = "test",
  • .parallel = TRUE) Extracting environmental data for each day between 2013-08-10 and 2013-09-15 (36 days) This may take a little while... Accessing and downloading IMOS environmental variable: rs_current Finding IMOS Ocean Current data... | | 0%Error in R_nc4_open: NetCDF: Unknown file format Error in sapply(x, fromDisk) & sapply(x, inMemory) : operations are possible only for numeric, logical or complex types Extracting and appending environmental data Filling gaps in environmental data by extracting median values from a 20km buffer around detections that fall on 'NA' values Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'extract': object 'env_stack' not found.

rs_npp is working for me in the example dataset, but not in my own which is weird so I will re-visit that problem to see what's different.

rs_chl & rs_turbidity are also both throwing up sapply errorcodes without any mention of netCDFS...

qc_data_1 <-

  • extractEnv(df = qc_data [1,],
  • X = "receiver_deployment_longitude",
  • Y = "receiver_deployment_latitude",
  • datetime = "detection_datetime",
  • env_var = "rs_turbidity",
  • cache_layers = FALSE,
  • crop_layers = TRUE,
  • full_timeperiod = TRUE,
  • fill_gaps = TRUE,
  • folder_name = "test",
  • .parallel = TRUE) Extracting environmental data for each day between 2013-08-10 and 2013-08-10 (0 days) This may take a little while... Accessing and downloading IMOS environmental variable: rs_turbidity Downloading environmental data in parallel across 8 cores... Error in sapply(x, fromDisk) & sapply(x, inMemory) :
    operations are possible only for numeric, logical or complex types Extracting and appending environmental data Filling gaps in environmental data by extracting median values from a 5km buffer around detections that fall on 'NA' values Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'extract': object 'env_stack' not found qc_data_1 <-
  • extractEnv(df = qc_data [1,],
  • X = "receiver_deployment_longitude",
  • Y = "receiver_deployment_latitude",
  • datetime = "detection_datetime",
  • env_var = "rs_chl",
  • cache_layers = FALSE,
  • crop_layers = TRUE,
  • full_timeperiod = TRUE,
  • fill_gaps = TRUE,
  • folder_name = "test",
  • .parallel = TRUE) Extracting environmental data for each day between 2013-08-10 and 2013-08-10 (0 days) This may take a little while... Accessing and downloading IMOS environmental variable: rs_chl Downloading environmental data in parallel across 8 cores... Error in sapply(x, fromDisk) & sapply(x, inMemory) :
    operations are possible only for numeric, logical or complex types Extracting and appending environmental data Filling gaps in environmental data by extracting median values from a 5km buffer around detections that fall on 'NA' values Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'extract': object 'env_stack' not found
ianjonsen commented 1 year ago

The errors you're getting now suggest an issue with your ncdf4 package version. Try updating to ncdf4_1.21 if you have an earlier version installed. There may be other package version issues, so please post results of sessionInfo() here. Failing that, there may be a Windows platform issue that I can't test for at the moment.

ecologistpablo commented 1 year ago

So we have gotten rs_current to work, but on a Macbook of my supervisor Dr. Ross Dwyer, not my windows laptop. this is my sessionInfo(), I have recently updated all packages that I could:

sessionInfo() R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8

time zone: Australia/Brisbane tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ncdf4_1.21 raster_3.6-20 sp_1.6-0 devtools_2.4.5 usethis_2.1.6
[6] data.table_1.14.8 readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[11] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[16] ggplot2_3.4.2 tidyverse_2.0.0 VTrack_2.11 foreach_1.5.2 remora_0.7.3

loaded via a namespace (and not attached): [1] later_1.3.1 R.oo_1.25.0 cellranger_1.1.0 xts_0.13.1
[5] XML_3.99-0.14 lifecycle_1.0.3 sf_1.0-12 doParallel_1.0.17
[9] vroom_1.6.3 globals_0.16.2 processx_3.8.1 lattice_0.21-8
[13] MASS_7.3-58.4 crosstalk_1.2.0 backports_1.4.1 magrittr_2.0.3
[17] plotly_4.10.1 rmarkdown_2.21 yaml_2.3.7 remotes_2.4.2
[21] httpuv_1.6.11 rgdal_1.6-6 zip_2.3.0 sessioninfo_1.2.2
[25] pkgbuild_1.4.0 rgeos_0.6-2 DBI_1.1.3 RColorBrewer_1.1-3
[29] ade4_1.7-22 pkgload_1.3.2 maps_3.4.1 abind_1.4-5
[33] rvest_1.0.3 aqp_1.42 R.utils_2.12.2 satellite_1.0.4
[37] listenv_0.9.0 terra_1.7-29 units_0.8-2 parallelly_1.35.0
[41] svglite_2.1.1 codetools_0.2-19 dismo_1.3-9 gstat_2.1-1
[45] xml2_1.3.4 tidyselect_1.2.0 farver_2.1.1 viridis_0.6.3
[49] stats4_4.3.0 base64enc_0.1-3 webshot_0.5.4 gmt_2.0.3
[53] jsonlite_1.8.4 e1071_1.7-13 ellipsis_0.3.2 progressr_0.13.0
[57] iterators_1.0.14 systemfonts_1.0.4 tools_4.3.0 Rcpp_1.0.10
[61] glue_1.6.2 gridExtra_2.3 leaflet.providers_1.9.0 xfun_0.39
[65] withr_2.5.0 fastmap_1.1.1 boot_1.3-28.1 fansi_1.0.4
[69] callr_3.7.3 digest_0.6.31 timechange_0.2.0 R6_2.5.1
[73] mime_0.12 colorspace_2.1-0 R.methodsS3_1.8.2 utf8_1.2.3
[77] generics_0.1.3 intervals_0.15.3 FNN_1.1.3.2 class_7.3-21
[81] prettyunits_1.1.1 httr_1.4.6 htmlwidgets_1.6.2 pkgconfig_2.0.3
[85] gtable_0.3.3 furrr_0.3.1 htmltools_0.5.5 pixmap_0.4-12
[89] profvis_0.3.8 scales_1.2.1 kableExtra_1.3.4 png_0.1-8
[93] CircStats_0.2-6 colorRamps_2.3.1 knitr_1.42 rstudioapi_0.14
[97] geosphere_1.5-18 tzdb_0.4.0 curl_5.0.0 spacetime_1.3-0
[101] checkmate_2.1.0 proxy_0.4-27 zoo_1.8-12 cachem_1.0.8
[105] KernSmooth_2.23-20 parallel_4.3.0 miniUI_0.1.1.1 foreign_0.8-84
[109] pillar_1.9.0 grid_4.3.0 vctrs_0.6.2 urlchecker_1.0.1
[113] promises_1.2.0.1 mapview_2.11.0 xtable_1.8-4 cluster_2.1.4
[117] evaluate_0.21 adehabitatHR_0.4.21 maptools_1.1-6 cli_3.6.1
[121] compiler_4.3.0 crayon_1.5.2 rlang_1.1.1 plotKML_0.8-3
[125] classInt_0.4-9 ps_1.7.5 plyr_1.8.8 fs_1.6.2
[129] stringi_1.7.12 viridisLite_0.4.2 stars_0.6-1 munsell_0.5.0
[133] fasttime_1.1-0 lazyeval_0.2.2 leaflet_2.1.2 Matrix_1.5-4
[137] hms_1.1.3 bit64_4.0.5 leafem_0.2.0 future_1.32.0
[141] gdistance_1.6.2 shiny_1.7.4 igraph_1.4.2 memoise_2.0.1
[145] adehabitatMA_0.3.16 lwgeom_0.2-11 bit_4.0.5 adehabitatLT_0.3.27

Another error that's popped up now is the rs_currents only works for 1993 - 2020. Looking into it, the rs_currents url (https://thredds.aodn.org.au/thredds/catalog/IMOS/OceanCurrent/GSLA/DM/catalog.html) only holds data up to 2020, and GSLA NRT (another folder in the IMOS website) ranges from 2011 - current day so for rs_current to work for any data from 2020 - now .build_urls needs updating I believe. This is the error code we get when trying for dates from 2021 - 2022: image

ianjonsen commented 1 year ago

Environmental data download on Windows should now be fixed on the staging branch. To download:

remotes::install_github("IMOS-AnimalTracking/remora@staging", dependencies = TRUE)

Closing this issue now but you are free to add comments if you find additional issues, etc

ecologistpablo commented 1 year ago

Thanks @ianjonsen , I can now successfully download current data on Windows up until available dates while accessing NRT data using the new clause in extractEnv.