Barski-lab / cwl-airflow

Python package to extend Airflow functionality with CWL1.1 support
https://barski-lab.github.io/cwl-airflow
Apache License 2.0
185 stars 32 forks source link

No such file or directory: '…/cwltool-venv3/lib/Resources/app_packages/bin #32

Closed mr-c closed 4 years ago

mr-c commented 5 years ago

This is the current failure we get running the CWL conformance tests on ci.commonwl.org against cwl-airflow:

https://ci.commonwl.org/job/airflow-conformance/288/console

Successfully installed cwl-airflow-1.1.0
+ export AIRFLOW_HOME=/var/lib/jenkins/jobs/airflow-conformance/workspace/airflow
+ rm -Rf /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow
+ cwl-airflow init -l 1 -p 1
/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/airflow/configuration.py:627: DeprecationWarning: You have two airflow.cfg files: /var/lib/jenkins/airflow/airflow.cfg and /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow/airflow.cfg. Airflow used to look at ~/airflow/airflow.cfg, even when AIRFLOW_HOME was set to a different value. Airflow will now only read /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow/airflow.cfg, and you should remove the other file
  category=DeprecationWarning,
Traceback (most recent call last):
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 40, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 36, in main
    args.func(args)
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 11, in run_init
    launcher.init()
  File "/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/cwl_airflow/app/launch.py", line 51, in init
    self.update_shebang(os.path.join(self.contents_dir, "Resources/app_packages/bin"))
  File "/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/cwl_airflow/app/launch.py", line 134, in update_shebang
    for filename in os.listdir(lookup_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/Resources/app_packages/bin'
pdblood commented 5 years ago

I am getting the same error when I attempt to do cwl-airflow init after installing via pip install cwl-airflow on both CentOS 7.6 and Ubuntu 18.04 systems with python 3.6 and python 3.7 respectively.

portah commented 5 years ago

I am getting the same error when I attempt to do cwl-airflow init after installing via pip install cwl-airflow on both CentOS 7.6 and Ubuntu 18.04 systems with python 3.6 and python 3.7 respectively.

What version of cwl-airflow (1.1.9 or 1.0.16) are you installing? cwl-airflow init is not required to work properly. It is a help tool to create mac app. You might just need to follow airflow install instructions like airflow initdb

P.S. We are still going to fix the problem :)

pdblood commented 5 years ago

I'm using the latest version that comes from doing pip install cwl-airflow, 1.1.9.

Thanks for clarifying that cwl-airflow init is an optional tool for use on Macs. This is not conveyed in the documentation I was following, which presents cwl-airflow init as a required step, with no indication that it is platform specific.

Is there more up-to-date guide I should be following?

portah commented 5 years ago

We are working on new documentation and version (for version 1.1.9). Actually the 1.1 version is a replacement for https://github.com/datirium/cwl-airflow-parser

pdblood commented 5 years ago

Thanks for the clarification. For now, what would be the best way for me to create an Airflow DAG from a CWL workflow and run it in Airflow? I notice that the current version 1.1.9 does not support the documented command cwl-airflow submit:

cwl-airflow submit
usage: cwl-airflow [-h] {apiserver,init} ...
cwl-airflow: error: invalid choice: 'submit' (choose from 'apiserver', 'init')

I'd like to run DAGs generated from CWL workflows using Airflow's regular mechanisms for scheduling and running DAGs, so I think I just need a way to generate the cwl_dag.py file that is referenced in the documentation .

portah commented 5 years ago

We are still working to simplify CWL input to airflow (like submit)

After CWL-Airflow installation you have to run airflow initdb and follow airflow manual to tune any settings. After that you'll have your DAG directory default is $AIRFLOW_HOME/dags. There you have to put a python script that will open a CWL file and pass it to CWLDAG class.

When preparation is complete and you can see the CWL DAG in airflow web interface. You can trigger dag with params for a job with extra key output_folder like this -d DAG_ID -r RUN_ID -c "{\"job\":{\"output_folder\":\"/your/output/folder\"}}"

pdblood commented 5 years ago

Thanks, I do have airflow up and running. Is this the form of the python script that I need to put in the dags folder (from https://github.com/datirium/cwl-airflow-parser):

from cwl_airflow_parser import CWLDAG, CWLJobDispatcher, CWLJobGatherer
from datetime import timedelta

def cwl_workflow(workflow_file):
    dag = CWLDAG(default_args={
        'owner': 'airflow',
        'email': ['my@email.com'],
        'email_on_failure': False,
        'email_on_retry': False,
        'retries': 20,
        'retry_exponential_backoff': True,
        'retry_delay': timedelta(minutes=30),
        'max_retry_delay': timedelta(minutes=60 * 4)
    },
        cwl_workflow=workflow_file)
    dag.create()
    dag.add(CWLJobDispatcher(dag=dag), to='top')
    dag.add(CWLJobGatherer(dag=dag), to='bottom')

    return dag

cwl_workflow("/path/to/my/workflow.cwl")

I tried using the code above (filling in relevant placeholder values with my specific details) but the CWL DAG is not appearing in the Airflow UI, whereas other new Airflow DAGs I created are appearing just fine.

portah commented 5 years ago

We use next snippet:

#!/usr/bin/env python3
from cwl_airflow import CWLDAG, CWLJobDispatcher, CWLJobGatherer
dag = CWLDAG(cwl_workflow="/home/airflow/airflow/dags/4tpj5NDvYv2fuSvdz-248491a93305040db9efb28bec01f23b0e835cf3.cwl", dag_id="4tpj5NDvYv2fuSvdz-248491a93305040db9efb28bec01f23b0e835cf3")
dag.create()
dag.add(CWLJobDispatcher(dag=dag), to='top')
dag.add(CWLJobGatherer(dag=dag), to='bottom')
portah commented 5 years ago

Have you received my email?

pdblood commented 5 years ago

Yes, thanks!