remote data on S3 - Githubissues

bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines

MIT License

65 stars 17 forks source link

Matthias; Thanks for looking into this. This is still work in progress but we're working on supporting CWL runs on AWS Batch using Cromwell. It's not yet functional. but here is the work in progress documentation so you can see what we've got in place:

https://bcbio-nextgen.readthedocs.io/en/latest/contents/cloud.html#amazon-web-services-aws-batch

Practically, it sounds like you don't need AWS batch and would instead just want to build inputs from S3-like buckets and then run them on your own infrastructure. This should work with the current CWL and Cromwell. You'd create an s3: configuration block in your input bcbio_system.yaml as described in the docs and then it should stage down files from there for running on your local cluster and shared filesystem.

I'd definitely welcome feedback and reports if you test this out. Thanks again.

bcbio / bcbio-nextgen-vm

remote data on S3 #172