awslabs / aws-s3-transfer-manager-rs

Apache License 2.0
6 stars 1 forks source link

Transfer manager upload support #7

Open aajtodd opened 2 months ago

aajtodd commented 2 months ago

Add support for upload directory/single file to TransferManager abstraction.

aajtodd commented 2 months ago

Optimize traversal

This is a well trodden path in Rust with lots of projects and blogs written around it (just a few included below):

  1. https://github.com/BurntSushi/walkdir - used in ripgrep internally
  2. https://github.com/byron/jwalk
  3. http://blog.vmchale.com/article/directory-traversals
  4. https://blog.burntsushi.net/ripgrep/
aajtodd commented 2 months ago

Strawman proposal for look and feel of the API (generally follows the AWS SDK for Rust):

use aws_s3_transfer_manager::Client;             
use aws_s3_transfer_manager::from_env;
use aws_s3_transfer_manager::config::{Throughput, MemoryLimit};
use aws_s3_transfer_manager::operation::upload_dir::UploadDirectoryFailurePolicy;
use aws_s3_transfer_manager::types::S3Object;
use aws_s3_transfer_manager::operation::upload::UploadStream;

// simple, load from environment/auto detect
let tm = aws_s3_transfer_manager::from_env().load().await;

// load from env with overrides
let tm = aws_s3_transfer_manager::from_env()    // -> ConfigLoader
    .throughput(Throughput::Gbps(20.))
    .load()
    .await;

// explicit
// NOTE: which config settings aren't super important here, just gives idea for the feeling of the API
//            we can bikeshed naming and what not later
let tm = aws_s3_transfer_manager::Client::builder()
    .client(aws_sdk_s3::Client)
    .target_throughput(Througput::Gbps(20.))
    .memory_limit(MemoryLimit::MaxMiB(1024))
    //.memory_limit(MemoryLimit::Percentage(20)) // 20% system memory
    ...
    .build();

// upload a file
let handle = tm.upload()
   .bucket()
   .key()
   ...
   .body(UploadStream::from_file());
   .send()
   .await?;

// handle.pause(self) -> Result<ResumeHandle, ...>
// handle.abort(self) -> Result<AbortedUpload, ...>
// handle.progress(&self) -> UploadProgress
let resp = handle.join()?;

// upload a directory
let handle = tm.upload_dir()
    .bucket()
    ...
    .source(impl AsRef<Path>)
    .recurse(bool)
    .key_prefix(String)
    .delimiter(String)
    .failure_policy(UploadDirectoryFailurePolicy)
    .filter(Fn<...>)
    .follow_symlinks(bool)
    .request_transformer(Fn<>)

// handle.pause(self) -> Result<ResumeHandle, ...>
// handle.abort(self) -> Result<AbortedDirUpload, ...>
// handle.progress(&self) -> UploadDirProgress

let resp = handle.join().await?;
Velfi commented 2 months ago

Thinking about abstractions: S3 TM Abstractions