Closed mitchellmanware closed 7 months ago
Data download functions updated to only download missing/nonexistent files (https://github.com/NIEHS/amadeus/commit/54395807994c57ff8a5fd969abc3b31629b8e232 and (https://github.com/NIEHS/amadeus/commit/357f9f66e88c9e607491a71acb2bb1ea13a0f9ae).
Note
In download_modis()
, the -m
flag has been removed from the the wget
commands. In order to identify nonexisting files, I included object download_name
to store download destination files. This was used with the -O
flag to specify the download file name instead of just the download folder (which used the -P
flag).
Before
# Main wget run
download_command <- paste0(
"wget -e robots=off -m -np -R .html,.tmp ",
"-nH --cut-dirs=3 \"",
download_url,
"\" --header \"Authorization: Bearer ",
nasa_earth_data_token,
"\" -P ",
directory_to_save,
"\n"
)
#### 15. concatenate and print download commands to "..._wget_commands.txt"
cat(download_command)
}
Now
# Main wget run
download_command <- paste0(
"wget -e robots=off -np -R .html,.tmp ",
"-nH --cut-dirs=3 \"",
download_url,
"\" --header \"Authorization: Bearer ",
nasa_earth_data_token,
"\" -O ",
directory_to_save,
download_name,
"\n"
)
#### filter commands to non-existing files
download_command <- download_command[
which(
!file.exists(download_name)
)
]
#### 15. concatenate and print download commands to "..._wget_commands.txt"
#### cat command only if file does not already exist
cat(download_command)
Better explanation from ChatGPT:
"When you combine the -O option (which specifies the output file) with the -r or -p options (which enable recursive downloading), wget will download all content into a single file rather than saving each file separately.
In your command, you're using the -m option, which is equivalent to -r -l inf --no-remove-listing, enabling mirroring. So, when -m is combined with -O, all downloaded content will be placed into the single file specified by -O, which is likely not the behavior you want.
If you intend to download only the specific file specified by the URL, remove the -m option."
@mitchellmanware Thanks! It will help a lot for streamlining the download part of the beethoven
pipeline.
@sigmafelix In reference to https://github.com/NIEHS/beethoven/blob/45e21d473642c5638273ec68272777c3f3b981cc/inst/targets/pipeline_base_functions.R#L175C1-L175C55:
Edits to download functions for only missing files incoming on
amadeus
branchmm-terraclimate-0325
Completed (https://github.com/NIEHS/amadeus/commit/54395807994c57ff8a5fd969abc3b31629b8e232)To Do