bio-oracle / biooracler

R package to access Bio-Oracle data via ERDDAP
Other
8 stars 2 forks source link

How to download the future data layers? #11

Closed zhangzhixin1102 closed 5 months ago

zhangzhixin1102 commented 6 months ago

Dear @vinisalazar @salvafern, Thank you for developing such a helpful R package. Using the biooracler R package, I successfully downloaded present-day data layers (i.e., 2000-2010 and 2010-2020). However, I face error when downloading future data layers.

For instance, I used the following scripts to download the present o2 layers:

library(biooracler)
time.periods <- c(2000) ## 2000: decade 2000-2010
target.layer <- "o2_baseline_2000_2018_depthsurf"

latitude = c(-89.975, 89.975)
longitude = c(-179.975, 179.975)

time = c(paste0(time.periods, "-01-01T00:00:00Z"),
         paste0(time.periods, "-01-01T00:00:00Z"))

constraints = list(time, latitude, longitude)
names(constraints) = c("time", "latitude", "longitude")

r <- download_layers(dataset_id = target.layer,
                     # variables = variables,
                     constraints = constraints,
                     directory = ".",
                     fmt = "raster")

I tried to use similar strategy to download future data layers (ssp119, 2020-2030, surface, o2), but failed.

library(biooracler)
time.periods <- c(2020) ## 2020: decade 2020-2030
target.layer <- "o2_ssp119_2020_2100_depthsurf"

latitude = c(-89.975, 89.975)
longitude = c(-179.975, 179.975)

time = c(paste0(time.periods, "-01-01T00:00:00Z"),
         paste0(time.periods, "-01-01T00:00:00Z"))

constraints = list(time, latitude, longitude)
names(constraints) = c("time", "latitude", "longitude")

r <- download_layers(dataset_id = target.layer,
                     # variables = variables,
                     constraints = constraints,
                     directory = ".",
                     fmt = "raster")

The error message is as follows:

Selected dataset o2_ssp119_2020_2100_depthsurf. Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/o2_ssp119_2020_2100_depthsurf.html Error:

Could you please confirm this issue and guide me to download future data layers? Best regards, Zhixin

salvafern commented 6 months ago

Hi @zhangzhixin1102 , thanks for using Bio-Oracle!

I cannot reproduce the issue. See here before (smaller spatial extension to test that it works)

library(biooracler)
time.periods <- c(2020) ## 2020: decade 2020-2030
target.layer <- "o2_ssp119_2020_2100_depthsurf"

latitude = c(10, 20)
longitude = c(120, 130)

time = c(paste0(time.periods, "-01-01T00:00:00Z"),
         paste0(time.periods, "-01-01T00:00:00Z"))

constraints = list(time, latitude, longitude)
names(constraints) = c("time", "latitude", "longitude")

r <- download_layers(dataset_id = target.layer,
                     # variables = variables,
                     constraints = constraints,
                     directory = ".",
                     fmt = "raster")
#> Selected dataset o2_ssp119_2020_2100_depthsurf.
#> Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/o2_ssp119_2020_2100_depthsurf.html
r
#> class       : SpatRaster 
#> dimensions  : 201, 201, 7  (nrow, ncol, nlyr)
#> resolution  : 0.04999996, 0.05  (x, y)
#> extent      : 120, 130.05, 10, 20.05  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> sources     : 78c4ca5b0ac43356d9e187b3bf962ee1.nc:o2_ltmax  
#>               78c4ca5b0ac43356d9e187b3bf962ee1.nc:o2_ltmin  
#>               78c4ca5b0ac43356d9e187b3bf962ee1.nc:o2_max  
#>               ... and 4 more source(s)
#> varnames    : o2_ltmax (Long-term maximum DissolvedMolecularOxygen) 
#>               o2_ltmin (Long-term minimum DissolvedMolecularOxygen) 
#>               o2_max (Maximum DissolvedMolecularOxygen) 
#>               ...
#> names       :   o2_ltmax,   o2_ltmin,     o2_max,    o2_mean,     o2_min,   o2_range, ... 
#> unit        : MMol' 'M-3, MMol' 'M-3, MMol' 'M-3, MMol' 'M-3, MMol' 'M-3, MMol' 'M-3, ... 
#> time        : 2020-01-01 UTC

Created on 2024-05-16 with reprex v2.1.0

Can you try removing first any cached .nc files? If it doesn't work, please paste here the output of sessionInfo(). My environment looks like:

sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] styler_1.10.3     digest_0.6.35     fastmap_1.1.1     xfun_0.43        
#>  [5] magrittr_2.0.3    glue_1.7.0        R.utils_2.12.3    knitr_1.46       
#>  [9] htmltools_0.5.8.1 rmarkdown_2.26    lifecycle_1.0.4   cli_3.6.2        
#> [13] R.methodsS3_1.8.2 vctrs_0.6.5       reprex_2.1.0      withr_3.0.0      
#> [17] compiler_4.3.3    R.oo_1.26.0       R.cache_0.16.0    purrr_1.0.2      
#> [21] rstudioapi_0.16.0 tools_4.3.3       evaluate_0.23     yaml_2.3.8       
#> [25] rlang_1.1.3       fs_1.6.4
zhangzhixin1102 commented 6 months ago

Hi @salvafern , thank you very much for considering my issue. In my pc, I tested smaller spatial extension, yes, it works. But when I use the global extent, the error occurs. Could you please conform whether you will success when using global extent (e.g., latitude = c(-89.975, 89.975); longitude = c(-179.975, 179.975))?

Below is my session information.

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 
[2] LC_CTYPE=Chinese (Simplified)_China.utf8   
[3] LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C                               
[5] LC_TIME=Chinese (Simplified)_China.utf8    

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] terra_1.7-58          biooracler_0.0.0.9000

loaded via a namespace (and not attached):
 [1] vctrs_0.6.4       cli_3.6.1         rlang_1.1.2       ncdf4_1.22       
 [5] crul_1.4.0        generics_0.1.3    jsonlite_1.8.8    data.table_1.15.0
 [9] glue_1.6.2        backports_1.4.1   httpcode_0.3.0    triebeard_0.4.1  
[13] fansi_1.0.6       rappdirs_0.3.3    tibble_3.2.1      fastmap_1.1.1    
[17] hoardr_0.5.4      lifecycle_1.0.4   memoise_2.0.1     compiler_4.3.1   
[21] codetools_0.2-19  dplyr_1.1.4       Rcpp_1.0.11       pkgconfig_2.0.3  
[25] digest_0.6.33     R6_2.5.1          rerddap_1.1.0     tidyselect_1.2.0 
[29] utf8_1.2.4        pillar_1.9.0      curl_5.1.0        magrittr_2.0.3   
[33] urltools_1.7.3    checkmate_2.3.1   xml2_1.3.6        cachem_1.0.8     
salvafern commented 5 months ago

Hi @zhangzhixin1102

I can see that the request fails when requesting more than two variables at once

 r <- download_layers(dataset_id = target.layer,
                      variables = c("o2_mean", "o2_min", "o2_max"),
                      constraints = constraints,
                      directory = ".",
                      fmt = "nc")
#> Selected dataset o2_ssp119_2020_2100_depthsurf.
#> Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/o2_ssp119_2020_2100_depthsurf.html
#> Selected 3 variables: o2_mean, o2_min, o2_max
#> Error: 

 r <- download_layers(dataset_id = target.layer,
                      variables = c("o2_mean", "o2_min"),
                      constraints = constraints,
                      directory = ".",
                      fmt = "nc")
#> Selected dataset o2_ssp119_2020_2100_depthsurf.
#> Dataset info available at: http://erddap.bio-oracle.org/erddap/griddap/o2_ssp119_2020_2100_depthsurf.html
#> Selected 3 variables: o2_mean, o2_min

ERDDAP often struggles when requesting large amounts of data. It is still strange how the baseline layer works fine.

As a temporal workaround I suggest:

  1. Download variables one by one
  2. Download the native netcdf file and read locally
url <- "https://erddap.bio-oracle.org/erddap/files/o2_ssp119_2020_2100_depthsurf/climatologyDecadeDepthSurf.nc"
download.file(url, destfile = "./o2_ssp119_2020_2100_depthsurf.nc")

I will look further but I don't think there will be an obvious solution. Downloading smaller chunks will however work.

zhangzhixin1102 commented 5 months ago

Hi @salvafern , Thank you for your effort on this issue. Glad that you find the reason for this strange issue. Following your suggestion, now I write a loop and download one layer per time. Now I am downloading the data layers without error.

Could you please confirm whether the scripts are correct? Best regards, Zhixin

## change information in Line 17, 18, 23
## R 4.3.1
shell("cls")
rm(list = ls())
t1 <- Sys.time()

######################################
## 1 required R packages
######################################
library(terra)      ## packageVersion 1.4.11
library(biooracler) ## packageVersion 0.0.0.9000
## https://gis.stackexchange.com/questions/427923/preventing-terra-from-writing-auxiliary-files-when-writing-to-disc
setGDALconfig("GDAL_PAM_ENABLED", "FALSE")

## 2000: decade 2000-2010;
## 2010: decade 2010-2020
time.periods <- c(2040)  ## decade 2040-2050
target.ssp   <- "ssp585" ## ssp119, ssp126, ssp245, ssp370, ssp585

######################################
## 2 working directory
######################################
mypath0 <- "G:/SDMs_2024/bio-oracle-v3"
mypath1 <- paste0(mypath0, "/", time.periods, "-", target.ssp)
if(!file.exists(mypath1)) dir.create(mypath1)
setwd(mypath1)

######################################
## check present-day surface data layers
######################################
layers1 <- as.data.frame(list_layers())[, c("dataset_id", "title")]
layers2 <- layers1[(grepl(target.ssp, layers1$dataset_id, fixed=TRUE) & grepl("depthsurf", layers1$dataset_id, fixed=TRUE)), ]
layers2

######################################
## layers decided to download
######################################
## o2: Dissolved Molecular Oxygen
## thetao: ocean temperature
## so: salinity
## sws: seawater speed
layers3 <- c(paste0("o2_", target.ssp, "_2020_2100_depthsurf"),
             paste0("thetao_", target.ssp, "_2020_2100_depthsurf"),
             paste0("ph_", target.ssp, "_2020_2100_depthsurf"),
             paste0("so_", target.ssp, "_2020_2100_depthsurf"),
             paste0("sws_", target.ssp, "_2020_2100_depthsurf"))

######################################
## download settings
######################################
latitude = c(-89.975, 89.975)
longitude = c(-179.975, 179.975)

# time = c("2010-01-01T00:00:00Z", "2010-01-01T00:00:00Z")
time = c(paste0(time.periods, "-01-01T00:00:00Z"),
         paste0(time.periods, "-01-01T00:00:00Z"))

constraints = list(time, latitude, longitude)
names(constraints) = c("time", "latitude", "longitude")

######################################
## 3 start the loop
######################################
for (m in layers3) {

  ######################################
  ## variable loop: six variables
  ######################################
  for (n in c("mean", "max", "min", "range", "ltmax", "ltmin")) {

    r <- download_layers(dataset_id  = m,
                         variables   = paste0(gsub("_ssp.*","",m), "_", n),
                         constraints = constraints,
                         directory   = ".",
                         fmt         = "raster")

    ######################################
    ## save
    ######################################
    writeRaster(r, paste0(names(r), ".tif"), overwrite = TRUE)

    ####################################
    ## delete nc files
    ####################################
    unlink(list.files(mypath1, pattern = "\\.nc$", full.names = T), recursive = FALSE, force = FALSE)

    ####################################
    ## make a marker
    ####################################
    cat(paste(replicate(80, "#"), collapse = ""), 
        '\n', '####', '\n',
        '#### download ', paste0(gsub("_ssp.*","",m), "_", n),
        '\n', '####', '\n',
        paste(replicate(80, "#"), collapse = ""), '\n', sep='')

  } ## loop n
} ## loop m

print(Sys.time() - t1) ##
salvafern commented 5 months ago

Seems ok yes!