Open sjfleming opened 2 years ago
If there is a need for this on AWS or Azure, I would be interested in contributing to this work.
Interesting that you should say that @lynnlangit ! We recently had a request to help make this work on AWS (from a AWS solutions architect working on Amazon Omics). We don't have many internal Broad users wanting this at the moment, but a lot of the Imaging Platform does their work on AWS (not using Terra or Cromwell). And, institutionally, there is a push within Broad's Data Sciences Platform currently to get workflows up and running on Azure due to a collaboration with Microsoft.
So we would welcome any contribution you'd be interested in making!
I will mention though: we actively use the current google backend to analyze data, so we want to ensure that part doesn't break / change too much... I think the best path forward is probably to
have separate sorts of "cloud file copying" commands for separate backends
even though this is not the way WDL is supposed to work. But we are open to other opinions! (If we could write one set of WDLs that are really agnostic to the backend, that would be fantastic. The reason we didn't do that at the outset is that there are just so many individual input files - images - involved. There are several ways we could get around this though...)
I also don't think I have a way to test workflows on AWS personally. It would be easier for us to test (using Terra) workflows on Azure, since "Terra on Azure" is now live. I don't really know how I'd review PRs for something running on AWS until I can figure out how to test it...
@carmendv @deflaux
@sjfleming - thanks for the info - fyi...
Given this - what is the next step on this project?
It's great to hear that you would like to contribute @lynnlangit !
Regarding next steps:
multicloud
and contributors can send pull requests from their forks to that branch.
multicloud
branch to test on the various clouds.multicloud
some cloud-specific inputs.json files for that plate and document where to find the expected outputs for validation.@lynnlangit we've completed:
inputs.json
files for AWS and GCP. Is there any other information we can provide to you at this time? Thank you!
(Only if people actually want / need this. But I assume some people might. I think the Imaging Platform stores a lot of data on AWS.)
Supposedly Terra will be supporting multiple backends (GCP, AWS, Azure) in the near future. All of our "gsutil" commands (which kind of break the usual WDL logic) only work on GCP.
We should think about whether we can do everything strictly in WDL, without any gsutil commands. Or whether we can have separate sorts of "cloud file copying" commands for separate backends, calling the right ones where appropriate.