ecohealthalliance / open-rvfcast

Wellcome Open RVFCast project repository
Other
0 stars 2 forks source link

Error in aws_s3_upload of ECMWF transformed data #58

Closed emmamendelsohn closed 1 year ago

emmamendelsohn commented 1 year ago

This is could be due to the size of the data, each parquet is 2.8 GB. @noamross is working on alternatives to paws based workflows

 out <- unname(mapply(aws_s3_upload_single, path = files,  key = keys, MoreArgs = list(bucket = bucket, check = check,  svc = svc, file_type = file_type), SIMPLIFY = FALSE))

Error in curl::handle_setopt(handle, .list = req$options) : 
  A libcurl function was given a bad argument
In addition: Warning message:
In curl::handle_setopt(handle, .list = req$options) :
  NAs introduced by coercion to integer range
noamross commented 1 year ago

OK, so a question for both @emmamendelsohn @collinschwantes on usage for aws_s3_up/download. All the utilities that wrap the AWS S3 sdk (mc, rclone, aws) all are of form command SOURCE DEST, where there can only be one argument for SOURCE and DEST, but they can be a directory that is copied recursively. I believe things should be more efficient if we name a directory and let the utility figure out the files (it will do hash-checking as well), but in general, we are providing a long vector of filenames, correct?

collinschwantes commented 1 year ago

while containerTemplateUtils functions can take dirs as key/path arguments, they just map/apply the single_aws_X to do download or upload folders. There is probably a more efficient way to do this.

Refactoring the code and creating "aws_dir_upload/download" functions might make the package easier to use.

On Thu, Sep 14, 2023 at 1:23 PM Noam Ross @.***> wrote:

OK, so a question for both @emmamendelsohn https://github.com/emmamendelsohn @collinschwantes https://github.com/collinschwantes on usage for aws_s3_up/download. All the utilities that wrap the AWS S3 sdk (mc, rclone, aws) all are of form command SOURCE DEST, where there can only be one argument for SOURCE and DEST, but they can be a directory that is copied recursively. I believe things should be more efficient if we name a directory and let the utility figure out the files (it will do hash-checking as well), but in general, we are providing a long vector of filenames, correct?

— Reply to this email directly, view it on GitHub https://github.com/ecohealthalliance/open-rvfcast/issues/58#issuecomment-1720016226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOTF3KRAXHBGOPLBUEKOQLX2NKUNANCNFSM6AAAAAA4YTKPF4 . You are receiving this because you were mentioned.Message ID: @.***>

--

Collin Schwantes Data LibrarianAppointment Calendar https://calendar.app.google/QQeEgyhqg3kJCGSq6

EcoHealth Alliance 520 Eighth Avenue, Suite 1200 New York, NY 10018

1.332.330.9132 ext. 4514 (office) 1.262.389.8518 (mobile) www.ecohealthalliance.org

emmamendelsohn commented 1 year ago

switched to aws.s3::put_object(...multipart=TRUE) for the large ECMWF files