Closed emmamendelsohn closed 1 year ago
OK, so a question for both @emmamendelsohn @collinschwantes on usage for aws_s3_up/download
. All the utilities that wrap the AWS S3 sdk (mc
, rclone
, aws
) all are of form command SOURCE DEST
, where there can only be one argument for SOURCE and DEST, but they can be a directory that is copied recursively. I believe things should be more efficient if we name a directory and let the utility figure out the files (it will do hash-checking as well), but in general, we are providing a long vector of filenames, correct?
while containerTemplateUtils functions can take dirs as key/path arguments, they just map/apply the single_aws_X to do download or upload folders. There is probably a more efficient way to do this.
Refactoring the code and creating "aws_dir_upload/download" functions might make the package easier to use.
On Thu, Sep 14, 2023 at 1:23 PM Noam Ross @.***> wrote:
OK, so a question for both @emmamendelsohn https://github.com/emmamendelsohn @collinschwantes https://github.com/collinschwantes on usage for aws_s3_up/download. All the utilities that wrap the AWS S3 sdk (mc, rclone, aws) all are of form command SOURCE DEST, where there can only be one argument for SOURCE and DEST, but they can be a directory that is copied recursively. I believe things should be more efficient if we name a directory and let the utility figure out the files (it will do hash-checking as well), but in general, we are providing a long vector of filenames, correct?
— Reply to this email directly, view it on GitHub https://github.com/ecohealthalliance/open-rvfcast/issues/58#issuecomment-1720016226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOTF3KRAXHBGOPLBUEKOQLX2NKUNANCNFSM6AAAAAA4YTKPF4 . You are receiving this because you were mentioned.Message ID: @.***>
--
Collin Schwantes Data LibrarianAppointment Calendar https://calendar.app.google/QQeEgyhqg3kJCGSq6
EcoHealth Alliance 520 Eighth Avenue, Suite 1200 New York, NY 10018
1.332.330.9132 ext. 4514 (office) 1.262.389.8518 (mobile) www.ecohealthalliance.org
switched to aws.s3::put_object(...multipart=TRUE)
for the large ECMWF files
This is could be due to the size of the data, each parquet is 2.8 GB. @noamross is working on alternatives to
paws
based workflows