Closed peterdesmet closed 5 months ago
I'd like to work on this, here are my thoughts:
Decide on controlled values for format. Either csv/hdf5 (widely applicable) or monthly, daily, hdf5
I prefer csv/hdf5, and return in the same directory structure as the bucket
Should we make use of an S3 dependency to download files?
Let's try to do it without the dependency at first and reevaluate.
How should we warn for unrecognized radars?
radars: {missing_radars} not found, found {number_of_radars} other radars
How should we warn for dates without data?
No radars found for all dates, radars found for {first_date_found} to {last_date_found}
download_vpfiles()
?Has work already started on this? Is there a branch I can continue on?
If a radar is missing, stop.
A json file exists with a list of radars: https://github.com/enram/aloftdata.eu/blob/main/_data/OPERA_RADARS_DB.json:
jsonlite::fromJSON("https://raw.githubusercontent.com/enram/aloftdata.eu/main/_data/OPERA_RADARS_DB.json")$odimcode
Allow downloading multiple radars
download_vpfiles()
as an example
Error if radar doesn't exist based on json file
Try writing function so it'll download whatever it can
Use progress to show how many files have already been downloaded. Silence with progress=FALSE
.
Message for each file downloaded (cf. download_vpfiles
). Silence with verbose=FALSE
.
We have decided to build a function list_vpts_aloft()
that returns a vector of urls that are known to exist, given the filtering parameters originally envisioned for download_vpts_alof()
list_vpts_aloft(
date_min = NULL,
date_max = NULL,
radars = NULL,
# directory = ".", This parameter is removed
# overwrite = FALSE, This parameter is removed
format = "csv" # also hdf5
source = "baltrad", # also ecog-04003
)
Checking if a file exists can be done using the aws.s3
dependency via: aws.s3::get_bucket_df(bucket = "s3://aloft", prefix="baltrad/monthly", region = "eu-west-1", max = 2000)
or much slower using httr: urls[!furrr::future_map_lgl(urls, ~httr::http_error(httr::HEAD(.x)))]
Note: I think it might be better to create a generic download_files()
function that is provided a vector of URLs (e.g. generated by list_vpts_aloft()
), see #648
format
. Eithercsv
/hdf5
(widely applicable) ormonthly
,daily
,hdf5
Todo
https://lw-enram.s3-eu-west-1.amazonaws.com
tos3://aloft
source = BALTRAD
radars
parameter will remain the same (5 letter code:bejab
)directory
parameter can remain the sameoverwrite
parameter can remain the samedownload_vpfiles()
todownload_vpts_aloft(format = "hdf5")