brazil-data-cube / rstac

R Client Library for SpatioTemporal Asset Catalog
https://brazil-data-cube.github.io/rstac
Other
69 stars 15 forks source link

Protocol "s3" not supported or disabled in libcurl #158

Open robbibt opened 1 month ago

robbibt commented 1 month ago

Describe the bug We are attempting to improve access to our Digital Earth Australia satellite data for R users, and rstac looks like a perfect option to replicate functionality we currently have via Python's pystac.client package.

I can successfully download STAC assets using the example code provided by rstac using the code below:

library(magrittr)
library(rstac)

stac("https://brazildatacube.dpi.inpe.br/stac/") %>%
  stac_search(collections = "CB4-16D-2",
              datetime = "2019-06-01/2019-08-01",
              limit=1) %>%
  stac_search() %>%
  get_request() %>%
  assets_download(asset_names = "thumbnail", output_dir = tempdir())

However, when attempting to download similar data from the Digital Earth Australia STAC endpoint (https://explorer.sandbox.dea.ga.gov.au/stac), I get the following Protocol "s3" not supported or disabled in libcurl error:

stac("https://explorer.sandbox.dea.ga.gov.au/stac") %>%
  stac_search(collections = "ga_s2am_ard_3",
              datetime = "2019-06-01/2019-08-01",
              limit=1) %>%
  stac_search() %>%
  get_request() %>%
  assets_download(asset_names = "nbart_red", output_dir = tempdir())

image

To Reproduce Run code example above.

Additional context We are able to successfully find and download data without issues in Python using the workflow documented here: https://knowledge.dea.ga.gov.au/guides/setup/gis/stac/

R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8 LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C [5] LC_TIME=English_Australia.utf8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rstac_1.0.0 magrittr_2.0.3 loaded via a namespace (and not attached): [1] Rcpp_1.0.11 png_0.1-8 class_7.3-20 sf_1.0-16 crayon_1.5.2 grid_4.2.0 R6_2.5.1 [8] jsonlite_1.8.7 DBI_1.1.3 units_0.8-4 e1071_1.7-13 KernSmooth_2.23-20 httr_1.4.7 curl_5.1.0 [15] tools_4.2.0 jpeg_0.1-10 proxy_0.4-27 compiler_4.2.0 classInt_0.4-10
kadyb commented 1 month ago

I also encountered this problem (but I wanted to download Landsat data from Earth Search).

library("rstac")
stac("https://earth-search.aws.element84.com/v1") |>
  stac_search(collections = "landsat-c2-l2", limit = 1) |>
  post_request() |>
  assets_download(asset_names = "thumbnail", output_dir = tempdir())
#> Error: Error while downloading 's3://usgs-landsat/collection02/level-2/standard/oli-tirs/2024/082/075/LC09_L2SR_082075_20240518_20240521_02_T1/LC09_L2SR_082075_20240518_20240521_02_T1_thumb_small.jpeg'. 
#> Protocol "s3" not supported or disabled in libcurl

I also tried to open this URL in {terra} via GDAL, but without success.

library("terra")
url = "s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2024/082/075/LC09_L2SR_082075_20240518_20240521_02_T1/LC09_L2SR_082075_20240518_20240521_02_T1_thumb_small.jpeg"
setGDALconfig("AWS_NO_SIGN_REQUEST=YES")
r = rast(paste0("/vsis3/", url))
#> Error: [rast] file does not exist: /vsis3/s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2024/082/075/LC09_L2SR_082075_20240518_20240521_02_T1/LC09_L2SR_082075_20240518_20240521_02_T1_thumb_small.jpeg
#> In addition: Warning message:
#>   CURL error: URL rejected: Port number was not a decimal number between 0 and 65535 (GDAL error 11) 

Maybe credentials and additional package (like {paws}) are required to connect to AWS S3?

rolfsimoes commented 3 weeks ago

Dear @robbibt and @kadyb,

Thank you for bringing me this issue.

The error you're encountering is due to rstac uses httr package to download the images instead GDAL.

To resolve this, you can pass a custom download function to download_fn parameter. Below there is an example that should work for downloading the assets on S3:

library(magrittr)
library(rstac)

stac("https://explorer.sandbox.dea.ga.gov.au/stac") %>%
  stac_search(collections = "ga_s2am_ard_3",
              datetime = "2019-06-01/2019-08-01",
              limit=1) %>%
  stac_search() %>%
  get_request() %>%
  assets_download(
    asset_names = "nbart_red", 
    download_fn = \(asset) {
      out_file <- httr::parse_url(asset$href)$path
      out_file <- file.path(getwd(), out_file)
      out_dir <- dirname(out_file)
      if (!dir.exists(out_dir))
        dir.create(out_dir, recursive = TRUE)
      stopifnot(dir.exists(out_dir))
      if (!file.exists(out_file))
        sf::gdal_utils(
          util = "translate",
          source = asset$href,
          destination = out_file,
          quiet = TRUE
        )
      asset$href <- out_file
      return(asset)
    }
  ) -> items

This custom download_fn uses sf::gdal_utils() to download the asset using GDAL.

I recognize that this is not an ideal solution as it is cumbersome for the user. A much better solution would be to have this as the default behavior in the future. To maintain backwards compatibility, I have implemented the use_gdal parameter in assets_download() function. It will be available soon on CRAN. In the meantime, you can install the development version from the dev branch using the remotes package.

remotes::install_github("brazil-data-cube/rstac@b-1.0.1")

Here is an example on how to use it:

stac("https://explorer.sandbox.dea.ga.gov.au/stac") %>%
  stac_search(collections = "ga_s2am_ard_3",
              datetime = "2019-06-01/2019-08-01",
              limit=1) %>%
  stac_search() %>%
  get_request() %>%
  assets_download(
    asset_names = "nbart_red", 
    output_dir = getwd(),
    use_gdal = TRUE
  ) -> items

Best regards, Rolf

PS.: @kadyb, I couldn't reproduce your example. I got an access denied even passing AWS_NO_SIGN_REQUEST=YES. It seems I need to set AWS_SECRET_ACCESS_KEY to be able to access it:

stac("https://earth-search.aws.element84.com/v1") |>
  stac_search(collections = "landsat-c2-l2", limit = 1) |>
  post_request() |>
  assets_download(asset_names = "thumbnail", output_dir = getwd(), use_gdal = TRUE, 
                  config_options = c(AWS_NO_SIGN_REQUEST = "YES"))