NIEHS / amadeus

https://niehs.github.io/amadeus/
Other
7 stars 1 forks source link

MODIS download function pastes the base URL twice #126

Closed sigmafelix closed 1 month ago

sigmafelix commented 1 month ago

When I tried download_modis with the following code just now, no files were downloaded with repeated messages:

  1. Code

    amadeus::download_modis(
    product = "MOD13A2",
    nasa_earth_data_token = tk,
    date = c("2022-01-01", "2022-03-31"),
    directory_to_save = "/mnt/s/Projects/beethoven_input",
    acknowledgement = TRUE,
    download = TRUE
    )
  2. Error message

    --2024-09-23 12:31:55--  https://ladsweb.modaps.eosdis.nasa.gov/https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD13A2/2022/097/MOD13A2.A2022097.h09v03.061.2022121224455.hdf
    Resolving ladsweb.modaps.eosdis.nasa.gov (ladsweb.modaps.eosdis.nasa.gov)... xxx.xxx.xxx.xxx, xxxx:xxxx:xxxx:xxxx:xxxx

Apparently the base URL is included twice in the sink command. I will investigate this issue.

mitchellmanware commented 1 month ago

@sigmafelix I believe it is due to the filelist already containing the base url before it is used in the subsequent sprintf() command.

    filelist <-
      rvest::read_html(filedir_url) |>
      rvest::html_elements("tr") |>
      rvest::html_attr("data-path")

    filelist_sub <-
      grep(
        paste0("(", paste(tiles_requested, collapse = "|"), ")"),
        filelist,
        value = TRUE
      )
    download_url <- sprintf("%s%s", ladsurl, filelist_sub)

Running with a debug function which returns the ladsurl, filelist_sub, and download_url, it is clear that filelist_sub already contains the base url. I will debug by removing the double-paste and re-run tests.

> download_modis_debug(
+   product = "MOD09GA",
+   version = "61",
+   horizontal_tiles = c(7, 8),
+   vertical_tiles = c(3, 4),
+   date = "2018-01-01",
+   nasa_earth_data_token = readLines("~/nasa_token.txt"),
+   directory_to_save = path,
+   acknowledgement = TRUE,
+   download = FALSE,
+   remove_command = FALSE
+ )
1 / 1 days of data available in the queried dates.

ladsurl:
[[1]]
[1] "https://ladsweb.modaps.eosdis.nasa.gov/"

filelist_sub:
[[2]]
[1] "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h07v03.061.2021295010220.hdf"
[2] "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h08v03.061.2021295010420.hdf"
[3] "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h08v04.061.2021295010503.hdf"

download_url:
[[3]]
[1] "https://ladsweb.modaps.eosdis.nasa.gov/https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h07v03.061.2021295010220.hdf"
[2] "https://ladsweb.modaps.eosdis.nasa.gov/https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h08v03.061.2021295010420.hdf"
[3] "https://ladsweb.modaps.eosdis.nasa.gov/https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h08v04.061.2021295010503.hdf"

I also propose to use the directory/Year/Julian/*.hdf file path as the saving directory. This will match the structure of the /ddn/gs1/group/set/Projects/NRT-AP-Model/input/modis/.../ folder, making it easier for new data for the pipeline.

mitchellmanware commented 1 month ago

Debug version runs as expected for MOD09GA - still need to check the MOD06_L2 versioning.

> download_modis_debug(
+   product = "MOD09GA",
+   version = "61",
+   horizontal_tiles = c(7, 8),
+   vertical_tiles = c(3, 4),
+   date = "2018-01-01",
+   nasa_earth_data_token = readLines("~/nasa_token.txt"),
+   directory_to_save = path,
+   acknowledgement = TRUE,
+   download = TRUE,
+   remove_command = TRUE
+ )
1 / 1 days of data available in the queried dates.

Downloading requested files...

--2024-10-07 12:59:14--  https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD09GA/2018/001/MOD09GA.A2018001.h07v03.061.2021295010220.hdf
Resolving ladsweb.modaps.eosdis.nasa.gov (ladsweb.modaps.eosdis.nasa.gov)... 198.118.194.40, 2001:4d0:241a:40c0::40
Connecting to ladsweb.modaps.eosdis.nasa.gov (ladsweb.modaps.eosdis.nasa.gov)|198.118.194.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 48202716 (46M) [application/octet-stream]
Saving to: ‘/ddn/gs1/home/manwareme/data/modis/MOD09GA/MOD09GA.A2018001.h07v03.061.2021295010220.hdf’

09GA/MOD09GA.A2018001.h07v03  88%[======================================>      ]  40.55M  3.21MB/s    eta 2s     
mitchellmanware commented 1 month ago

https://github.com/NIEHS/amadeus/pull/129

sigmafelix commented 1 month ago

@mitchellmanware Thank you for the fix!