e-sensing / sits

Satellite image time series in R
https://e-sensing.github.io/sitsbook/
GNU General Public License v2.0
470 stars 76 forks source link

"Tiles ... are missing or malformed and will be reprocessed." in an endless loop #946

Closed felixg3 closed 1 year ago

felixg3 commented 1 year ago

Describe the bug When attempting to regularize a data cube with sits_regularize(), I receive the error: Tiles 144051 (BLUE, GREEN, NIR08, RED, SWIR16, SWIR22), 143051 (BLUE, GREEN, NIR08, RED, SWIR16, SWIR22) . Weirdly enough, the CLOUD band is missing in this error message. If I let R run, it does not write data to the disk and the process continues for many hours.

To Reproduce Edit: When running sits_regularize() on Microsoft Planetary Computer JupyterLab, the first run takes quite some time. On my local environment, the first run finishes within a few seconds. The following code produces the error in my local environment. The same issue is not reproducible in the other environment (MPC JupyterLab) I posted at the end of this bug report. test-environment.zip

library(rstac) library(sits) library(sf) library(tidyverse) shape <- st_zm(read_sf("/home/felix/Dokumente/fernerkundung_hausarbeit_r/bangalore.shp")) s2_L8_cube_MPC <- sits_cube( source = "MPC", collection = "LANDSAT-C2-L2", bands = c("BLUE", "GREEN", "RED", "NIR08", "SWIR16", "SWIR22", "CLOUD"), roi = shape, start_date = "1990-01-01", end_date = "2022-12-31", multicores = 8, output_dir = "./tempdir2/" ) reg_cube <- sits_regularize( cube = s2_L8_cube_MPC, output_dir = "/home/felix/Dokumente/fernerkundung_hausarbeit_r/tempdir", res = 30, period = "P3M", multicores = 8 )

If reporting a change from previous versions

Please read https://cran.r-project.org/web/packages/sits/news/news.html first.

Additional context Add any other context about the problem here. Environment where the error comes up:

R version 4.2.3 (2023-03-15) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora Linux 37 (Workstation Edition) Matrix products: default BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.3 locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.1 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 [8] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 sits_1.3.0 terra_1.7-23 magrittr_2.0.3 rstac_0.9.2-2 [15] sf_1.0-12 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 class_7.3-21 digest_0.6.31 utf8_1.2.3 R6_2.5.1 [6] evaluate_0.20 e1071_1.7-13 httr_1.4.5 pillar_1.9.0 rlang_1.1.0 [11] curl_5.0.0 rstudioapi_0.14 rmarkdown_2.21 slider_0.3.0 munsell_0.5.0 [16] proxy_0.4-27 compiler_4.2.3 xfun_0.38 pkgconfig_2.0.3 CoprManager_0.5.1 [21] htmltools_0.5.5 tidyselect_1.2.0 codetools_0.2-19 randomForest_4.7-1.1 fansi_1.0.4 [26] crayon_1.5.2 tzdb_0.3.0 withr_2.5.0 gdalcubes_0.6.3 grid_4.2.3 [31] ulimit_0.0-3 jsonlite_1.8.4 gtable_0.3.3 lifecycle_1.0.3 DBI_1.1.3 [36] units_0.8-1 scales_1.2.1 warp_0.2.0 ncdf4_1.21 KernSmooth_2.23-20 [41] cli_3.6.1 stringi_1.7.12 generics_0.1.3 vctrs_0.6.1 geojsonsf_2.0.3 [46] tools_4.2.3 glue_1.6.2 hms_1.1.3 parallel_4.2.3 fastmap_1.1.1 [51] yaml_2.3.7 timechange_0.2.0 colorspace_2.1-0 classInt_0.4-9 knitr_1.42

Environment where the error is NOT reproducible

R version 4.2.2 (2022-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 22.04 LTS Matrix products: default BLAS/LAPACK: /srv/conda/envs/notebook/lib/libopenblasp-r0.3.21.so locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_1.0-7 sits_1.2.0 terra_1.5-21 magrittr_2.0.3 rstac_0.9.2-1 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 pillar_1.8.1 compiler_4.2.2 class_7.3-21 [5] base64enc_0.1-3 tools_4.2.2 ncdf4_1.21 digest_0.6.31 [9] uuid_1.1-0 jsonlite_1.8.4 lubridate_1.9.1 evaluate_0.20 [13] lifecycle_1.0.3 tibble_3.1.8 timechange_0.2.0 pkgconfig_2.0.3 [17] rlang_1.0.6 DBI_1.1.3 IRdisplay_1.1 cli_3.6.0 [21] parallel_4.2.2 curl_4.3.3 yaml_2.3.7 IRkernel_1.3.2 [25] warp_0.2.0 fastmap_1.1.0 e1071_1.7-12 withr_2.5.0 [29] repr_1.1.6 httr_1.4.4 dplyr_1.1.0 generics_0.1.3 [33] vctrs_0.5.2 grid_4.2.2 classInt_0.4-8 tidyselect_1.2.0 [37] glue_1.6.2 data.table_1.14.6 geojsonsf_2.0.3 R6_2.5.1 [41] fansi_1.0.4 pbdZMQ_0.3-9 tidyr_1.3.0 slider_0.3.0 [45] purrr_1.0.1 gdalcubes_0.6.0 units_0.8-1 codetools_0.2-18 [49] htmltools_0.5.4 KernSmooth_2.23-20 utf8_1.2.2 proxy_0.4-27 [53] crayon_1.5.2
OldLipe commented 1 year ago

Hi @felixg3,

Regularization takes a long time. Thus, a short lived MPC token could expire before the operation is completed leading to the problem you reported. Using a longer-lived token will in general solve the problem. We have added the option of including your MPC token in sits version 1.4.0 (currently under development). To install the current development version, please use: devtools::install_github("e-sensing/sits@dev").

To include your longer-lived token you need to add an environmental variable with the token as follows: Sys.setenv("MPC_TOKEN" = "YOUR-TOKEN"). Including your longer-lived token should solve the problem of long delay in accessing data from MPC.

In general, regularization is faster when the original data has been downloaded from the MPC cloud to a temporary storage in a local environment. This is true both when running on a virtual machine in the Azure Cloud or when running on a local computer.

felixg3 commented 1 year ago

Hi @felixg3,

Regularization takes a long time. Thus, a short lived MPC token could expire before the operation is completed leading to the problem you reported. Using a longer-lived token will in general solve the problem. We have added the option of including your MPC token in sits version 1.4.0 (currently under development). To install the current development version, please use: devtools::install_github("e-sensing/sits@dev").

Thank you very much for your quick reply. To clarify: the regularisation happens in an endless loop on my local machine and happens successfully on MPC. On my local device, nothing is written to the disk/output dir. I checked permissions and what I could think of with Linux issues in general.

By the way, is it possible to save the temporary output (the output parameter) to an external blob storage like with AzurStor? Or does it only support local directories?

To include your longer-lived token you need to add an environmental variable with the token as follows: Sys.setenv("MPC_TOKEN" = "YOUR-TOKEN"). Including your longer-lived token should solve the problem of long delay in accessing data from MPC.

Thank you, I'll try it out. Maybe I can open a PR to append the documentation?

In general, regularization is faster when the original data has been downloaded from the MPC cloud to a temporary storage in a local environment. This is true both when running on a virtual machine in the Azure Cloud or when running on a local computer.

Does sits_cube() download the data or does it create an index only that is then later queried by sits_regularize()?

gilbertocamara commented 1 year ago

Hi @felixg3 sits_cube() does not download the data; as you stated, it stores the addresses of the files (usually in the cloud) which are then used by sits_regularize() to produce a regular data cube. Please use sits_cube_copy() to copy the contents of a data cube.

felixg3 commented 1 year ago

Thank you so much for your assistance. I hope I did not bother you by framing my questions in a bug report.

However, I am still wondering why sits_regularize() works well in the cloud environment and not on my local machine. I assume that within the MPC there are no rate limits due to no egress traffic for Microsoft, while my local machine would need to use the MPC token. I will test it today and share the results.

gilbertocamara commented 1 year ago

kein Problem!! We are always pleased to help.