apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.21k stars 13.76k forks source link

PythonVirtualOperator fails silently when virtualenv is not installed. #40420

Open thisiswhereitype opened 3 days ago

thisiswhereitype commented 3 days ago

Apache Airflow version

2.9.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When using PythonVirtualOperator the venv is created by airflow as passed in the dag, in a tmp directory a standard python -m venv /tmp/.... is run. Later pip install -r ... is also run.

However, when virtualenv is not installed the failed command is not properly detected.

Later an uncaught OSError in subprocess.Popen (https://github.com/apache/airflow/blob/c310159bc2363c12110b11febd5febaab8670210/airflow/utils/process_utils.py#L184)is raised by the absent.venv/bin/pipcausing the exit logic not to fire and theTempDirectory.exit` then deletes the evidence.

See: subprocess Exceptions Additionally stderr is not logged.

What you think should happen instead?

I see in the project file there is a core-all listing including virtualenv.

Shoudn't virtualenv be a dependecy of the pip install apache-airflow? I haven't used hatch

If it shouldn't I think there should be better error handing to explain the issue.

How to reproduce

Run a PythonVirtualOperator with airflow installed in a venv that has no virtualenv.

Operating System

ws2 ubuntu 2404

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 3 days ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk commented 3 days ago

Marked it as good first issue for someone to handle.

eladkal commented 3 days ago

But we do check

https://github.com/apache/airflow/blob/5f2da71bf86f1f73acd802b66b979e77f53c8715/airflow/operators/python.py#L699-L700

potiuk commented 3 days ago

I think @thisiswhereitype to check why it did not work in his case.

thisiswhereitype commented 3 days ago

Thanks, the reason is that the is_venv_installed finds /usr/bin/virtualenv and proceeds but doesn't try to import it.

A module import would closely match what happens later.

(.venv_airflow) $ python  -c 'import importlib.util; importlib.util.find_spec("virtualenv")'
(.venv_airflow) $ which virtualenv
/usr/bin/virtualenv
(.venv_airflow) $ pip list | grep env
<no output>
(.venv_airflow) $
potiuk commented 2 days ago

Feel free to fix it, otherwise it will have to wait for someone else to pick it up.