Barski-lab / cwl-airflow

Python package to extend Airflow functionality with CWL1.1 support
https://barski-lab.github.io/cwl-airflow
Apache License 2.0
185 stars 32 forks source link

AWS batch extension #17

Closed Raphtor closed 5 years ago

Raphtor commented 6 years ago

As far as I can tell, at the moment only local execution via CWLtool is supported. I noticed on the supporting paper that one of the features listed is AWS support. Are there any plans to extend this tool to allow CWL workflows to run through Airflow on AWS batch?

portah commented 6 years ago

Can you give more details about what kind of scheme you'd like to use with Airflow & CWL? CWL Airflow uses cwltool to parse CWL then converts everything into Airflow's DAG then each Airflow's step executes cwltool's step.

  1. There is no out of the box transformation/access for inputs files like s3://URL. Or any setup like Airflow orchestrates work with AWS instances/nodes.

  2. You can install Airflow on every node in AWS with CWL Airflow support, setup celery and you will have cluster like system - that supported.

Google cloud composer uses Airflow to orchestrate work with their systems.

I think it should be that difficult to orchestrate work with AWS.

Raphtor commented 6 years ago

I would like to run AWS batch workflows defined by a CWL workflow on a local system. Airflow has a few operators that allow operations on AWS resources. I would imagine it would take some finagling of the CWLStepOperator to be able to execute things remotely using boto3 or to separate out the internal parsing of the inputs from the step tool during the DAG generation process and pass those along to the AWSbatch operator. Either way it does not seem like an easy task.