adokter / bioRad

R package for analysis and visualisation of biological signals in weather radar data
http://adokter.github.io/bioRad
Other
29 stars 16 forks source link

Create `download_vpts_aloft()` function #553

Closed peterdesmet closed 5 months ago

peterdesmet commented 1 year ago
download_vpts_aloft(
  date_min = NULL,
  date_max = NULL,
  radars = NULL,
  directory = ".",
  overwrite = FALSE,
  format = "csv" # also hdf5
  source = "baltrad", # also ecog-04003
)

Todo

PietrH commented 1 year ago

I'd like to work on this, here are my thoughts:

Q/A

Decide on controlled values for format. Either csv/hdf5 (widely applicable) or monthly, daily, hdf5

I prefer csv/hdf5, and return in the same directory structure as the bucket

Should we make use of an S3 dependency to download files?

Let's try to do it without the dependency at first and reevaluate.

How should we warn for unrecognized radars?

radars: {missing_radars} not found, found {number_of_radars} other radars

How should we warn for dates without data?

No radars found for all dates, radars found for {first_date_found} to {last_date_found}

Tests

Branch

Has work already started on this? Is there a branch I can continue on?

PietrH commented 1 year ago

If a radar is missing, stop.

PietrH commented 1 year ago
PietrH commented 1 year ago

We have decided to build a function list_vpts_aloft() that returns a vector of urls that are known to exist, given the filtering parameters originally envisioned for download_vpts_alof()

list_vpts_aloft(
  date_min = NULL,
  date_max = NULL,
  radars = NULL,
  # directory = ".", This parameter is removed
  # overwrite = FALSE, This parameter is removed
  format = "csv" # also hdf5
  source = "baltrad", # also ecog-04003
)

Checking if a file exists can be done using the aws.s3 dependency via: aws.s3::get_bucket_df(bucket = "s3://aloft", prefix="baltrad/monthly", region = "eu-west-1", max = 2000) or much slower using httr: urls[!furrr::future_map_lgl(urls, ~httr::http_error(httr::HEAD(.x)))]

peterdesmet commented 11 months ago

Note: I think it might be better to create a generic download_files() function that is provided a vector of URLs (e.g. generated by list_vpts_aloft()), see #648