facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.4k stars 609 forks source link

[Feature Request] Support for s3fs #733

Open samuelstanton opened 4 years ago

samuelstanton commented 4 years ago

šŸš€ Feature Request

Hi, I've been using Hydra for 6 mos or so, so this is more of a question than a feature request, since I don't know for sure that this isn't already supported. I'd like to modify hydra.run.dir to an s3 path (e.g. s3://<bucket_name>/<output_dir>) and then have Hydra write all outputs to that s3 bucket. It seems like if a user already has AWS CLI configured, then (maybe?) this functionality wouldn't be too hard to support using something like the s3fs package.

Motivation

I'd like to run my program in AWSBatch, so I need a way to pipe any output of the program to S3, similar to how Hydra already captures all output in a single directory.

Pitch

What would be ideal for me as a user would be for Hydra to automatically determine whether to use the local filesystem or s3fs, based on whether I override hydra.run.dir with a directory with an s3:// prefix.

The immediate alternative would be to periodically copy the contents of the output directory to S3 within my program.

I'd be willing to open a pull request but I would probably need some guidance on how to get started.

omry commented 4 years ago

Hi @samuelstanton! I have plans to abstract the working directory in Hydra, probably in 1.1. This will open up the path for a plugin that will support s3 as the working directory backend.

I am keeping it open as a reminder for the desire for s3 support for it. In the short term you are on your own. one thing you can consider is mounting your s3 bucket locally using something like s3fs-fuse and configure the Hydra sweepr/run dir to that mount point.

omry commented 3 years ago

Blocked on #993

datacubeR commented 2 years ago

I would like to upvote this feature since I think it is very powerful and will give a lot of of versatility. But in the meantime, do you have any suggestions on how to move output folder to s3? I'm guessing that using callbacks could be an option since on that steps output folders are supposed to be already created. Can you confirm that please?

Thanks in advance,

Alfonso

omry commented 2 years ago

Maybe check the current behavior? I don't think this is well defined.

oliversssf2 commented 2 years ago

Maybe consider adopting fsspec so that not only S3, Azure, GCS, and many more file systems can also be used?