aws / aws-mwaa-local-runner

This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.
MIT No Attribution
691 stars 703 forks source link

Request for './mwaa-local-env package-requirements' to provide .whl files only (no .tar.gz sources) #315

Closed maslick closed 11 months ago

maslick commented 1 year ago

Executing ./mwaa-local-env package-requirements first generates *.whl files within the ./plugins directory and subsequently compresses them into ./requirements/plugins.zip (to be later used in production).

According to the documentation, utilizing plugins.zip during runtime eliminates the necessity to fetch libraries dynamically at Fargate container startup. Additionally, this command creates a ./requirements/packaged_requirements.txt file, which can be employed to initiate aws-mwaa-local-runner, closely emulating the behavior observed in MWAA production environments.

However, it's important to note that not all dependencies are available in the *.whl format. Some dependencies are distributed as source files, such as mysqlclient-2.2.0.tar.gz. Consequently, these libraries need to be built during runtime.

This behavior might not be explicitly mentioned in the documentation. The documentation provides the following instruction:

To package the necessary WHL files for your requirements.txt without running Apache Airflow, use the following script:

./mwaa local-env package-requirements

However, when you start mwaa-local-env using the generated .whl files, MWAA may attempt to build mysqlclient-2.2.0.tar.gz and encounter issues, possibly due to missing wheel dependency.

The question arises: Is it feasible to package all dependencies as .whl files without the necessity to build (some of) them at startup?

Steps to reproduce:

$ git clone --branch v2.6.3 --depth 1 https://github.com/aws/aws-mwaa-local-runner.git
$ cd aws-mwaa-local-runner
$ ./mwaa-local-env build-image
$ ./mwaa-local-env package-requirements

$ ls -la requirements/
total 70M
-rw-r--r-- 1 ec2-user ec2-user 235 Sep 19 18:28 packaged_requirements.txt
-rw-r--r-- 1 ec2-user ec2-user 70M Sep 19 18:28 plugins.zip
-rw-rw-r-- 1 ec2-user ec2-user 184 Sep 19 18:22 requirements.txt

$ ls -la plugins | grep tar.gz
-rw-r--r-- 1 ec2-user ec2-user    29922 Sep 19 18:28 cron_descriptor-1.4.0.tar.gz
-rw-r--r-- 1 ec2-user ec2-user   151986 Sep 19 18:28 dill-0.3.1.1.tar.gz
-rw-r--r-- 1 ec2-user ec2-user    89543 Sep 19 18:28 mysqlclient-2.2.0.tar.gz
-rw-r--r-- 1 ec2-user ec2-user    81167 Sep 19 18:28 pendulum-2.1.2.tar.gz
-rw-r--r-- 1 ec2-user ec2-user    31954 Sep 19 18:28 python-nvd3-0.15.0.tar.gz
-rw-r--r-- 1 ec2-user ec2-user    10267 Sep 19 18:28 unicodecsv-0.14.1.tar.gz

$ mv requirements/requirements.txt requirements/requirements-original.txt
$ mv requirements/packaged_requirements.txt requirements/requirements.txt
$ ls -la requirements/
total 71616
drwxrwxr-x 2 ec2-user ec2-user     4096 Sep 19 18:32 .
drwxrwxr-x 9 ec2-user ec2-user     4096 Sep 19 18:22 ..
-rw-r--r-- 1 ec2-user ec2-user 73315149 Sep 19 18:28 plugins.zip
-rw-rw-r-- 1 ec2-user ec2-user      184 Sep 19 18:22 requirements-original.txt
-rw-r--r-- 1 ec2-user ec2-user      235 Sep 19 18:28 requirements.txt

$ cat requirements/requirements.txt
--find-links /usr/local/airflow/plugins
--no-index
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"

apache-airflow-providers-snowflake==4.2.0
apache-airflow-providers-mysql==5.1.1

$ ./mwaa-local-env start
local-runner_1  | Verification completed
local-runner_1  | --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"
local-runner_1  | Installing requirements.txt
local-runner_1  | Looking in links: /usr/local/airflow/plugins
local-runner_1  | Processing ./plugins/apache_airflow_providers_snowflake-4.2.0-py3-none-any.whl
local-runner_1  | Processing ./plugins/apache_airflow_providers_mysql-5.1.1-py3-none-any.whl
local-runner_1  | Processing ./plugins/snowflake_sqlalchemy-1.4.7-py2.py3-none-any.whl
local-runner_1  | Requirement already satisfied: apache-airflow-providers-common-sql>=1.3.1 in ./.local/lib/python3.10/site-packages (from apache-airflow-providers-snowflake==4.2.0->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (1.5.2)
local-runner_1  | Requirement already satisfied: apache-airflow>=2.4.0 in ./.local/lib/python3.10/site-packages (from apache-airflow-providers-snowflake==4.2.0->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (2.6.3)
local-runner_1  | Processing ./plugins/snowflake_connector_python-3.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
local-runner_1  | Processing ./plugins/mysqlclient-2.2.0.tar.gz
local-runner_1  |   Installing build dependencies: started
local-runner_1  |   Installing build dependencies: finished with status 'done'
local-runner_1  |   Getting requirements to build wheel: started
local-runner_1  |   Getting requirements to build wheel: finished with status 'done'
local-runner_1  |   Installing backend dependencies: started
local-runner_1  |   Installing backend dependencies: finished with status 'error'
local-runner_1  |   error: subprocess-exited-with-error
local-runner_1  |   
local-runner_1  |   × pip subprocess to install backend dependencies did not run successfully.
local-runner_1  |   │ exit code: 1
local-runner_1  |   ╰─> [3 lines of output]
local-runner_1  |       Looking in links: /usr/local/airflow/plugins
local-runner_1  |       ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
local-runner_1  |       ERROR: No matching distribution found for wheel
local-runner_1  |       [end of output]
local-runner_1  |   
local-runner_1  |   note: This error originates from a subprocess, and is likely not a problem with pip.
local-runner_1  | error: subprocess-exited-with-error
local-runner_1  | 
local-runner_1  | × pip subprocess to install backend dependencies did not run successfully.
local-runner_1  | │ exit code: 1
local-runner_1  | ╰─> See above for output.
local-runner_1  | 
local-runner_1  | note: This error originates from a subprocess, and is likely not a problem with pip.
maslick commented 1 year ago

I also tried to spin up a real MWAA environment with the resulting requirements.txt and plugins.zip. Same error observed in CloudWatch logs. As a result MWAA could not start:

Looking in links: /usr/local/airflow/plugins
--
Requirement already satisfied: apache-airflow==2.6.3 in ./.local/lib/python3.10/site-packages (2.6.3)
Processing ./plugins/apache_airflow_providers_snowflake-4.2.0-py3-none-any.whl (from -r /usr/local/airflow/requirements/requirements.txt (line 5))
Processing ./plugins/apache_airflow_providers_mysql-5.1.1-py3-none-any.whl (from -r /usr/local/airflow/requirements/requirements.txt (line 6))
Requirement already satisfied: alembic<2.0,>=1.6.3 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.11.1)
Requirement already satisfied: argcomplete>=1.10 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.1.1)
Requirement already satisfied: asgiref in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.7.2)
Requirement already satisfied: attrs>=22.1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (23.1.0)
Requirement already satisfied: blinker in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.6.2)
Requirement already satisfied: cattrs>=22.1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (23.1.2)
Requirement already satisfied: colorlog<5.0,>=4.0.2 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (4.8.0)
Requirement already satisfied: configupdater>=3.1.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.1.1)
Requirement already satisfied: connexion[flask]>=2.10.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.14.2)
Requirement already satisfied: cron-descriptor>=1.2.24 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.4.0)
Requirement already satisfied: croniter>=0.3.17 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.4.1)
Requirement already satisfied: cryptography>=0.9.3 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (40.0.2)
Requirement already satisfied: deprecated>=1.2.13 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.2.14)
Requirement already satisfied: dill>=0.2.2 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.3.1.1)
Requirement already satisfied: flask<2.3,>=2.2 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.2.5)
Requirement already satisfied: flask-appbuilder==4.3.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (4.3.1)
Requirement already satisfied: flask-caching>=1.5.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.0.2)
Requirement already satisfied: flask-login>=0.6.2 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.6.2)
Requirement already satisfied: flask-session>=0.4.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.5.0)
Requirement already satisfied: flask-wtf>=0.15 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.1.1)
Requirement already satisfied: google-re2>=1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.0)
Requirement already satisfied: graphviz>=0.12 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.20.1)
Requirement already satisfied: gunicorn>=20.1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (20.1.0)
Requirement already satisfied: httpx in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.23.3)
Requirement already satisfied: itsdangerous>=2.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.1.2)
Requirement already satisfied: jinja2>=3.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.1.2)
Requirement already satisfied: jsonschema>=4.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (4.18.0)
Requirement already satisfied: lazy-object-proxy in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.9.0)
Requirement already satisfied: linkify-it-py>=2.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.0.2)
Requirement already satisfied: lockfile>=0.12.2 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.12.2)
Requirement already satisfied: markdown>=3.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.4.3)
Requirement already satisfied: markdown-it-py>=2.1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.0.0)
Requirement already satisfied: markupsafe>=1.1.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.1.3)
Requirement already satisfied: marshmallow-oneofschema>=2.0.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.0.1)
Requirement already satisfied: mdit-py-plugins>=0.3.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.4.0)
Requirement already satisfied: packaging>=14.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (21.3)
Requirement already satisfied: pathspec~=0.9.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.9.0)
Requirement already satisfied: pendulum>=2.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.1.2)
Requirement already satisfied: pluggy>=1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.2.0)
Requirement already satisfied: psutil>=4.2.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (5.9.5)
Requirement already satisfied: pydantic<2.0.0,>=1.10.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.10.11)
Requirement already satisfied: pygments>=2.0.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.15.1)
Requirement already satisfied: pyjwt>=2.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.7.0)
Requirement already satisfied: python-daemon>=3.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.0.1)
Requirement already satisfied: python-dateutil>=2.3 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.8.2)
Requirement already satisfied: python-nvd3>=0.15.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.15.0)
Requirement already satisfied: python-slugify>=5.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (8.0.1)
Requirement already satisfied: rfc3339-validator>=0.1.4 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.1.4)
Requirement already satisfied: rich>=12.4.4 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (13.4.2)
Requirement already satisfied: rich-argparse>=1.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.2.0)
Requirement already satisfied: setproctitle>=1.1.8 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.3.2)
Requirement already satisfied: sqlalchemy<2.0,>=1.4 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.4.49)
Requirement already satisfied: sqlalchemy-jsonfield>=1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.0.1.post0)
Requirement already satisfied: tabulate>=0.7.5 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.9.0)
Requirement already satisfied: tenacity!=8.2.0,>=6.2.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (8.2.2)
Requirement already satisfied: termcolor>=1.1.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.3.0)
Requirement already satisfied: typing-extensions>=4.0.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (4.7.1)
Requirement already satisfied: unicodecsv>=0.14.1 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (0.14.1)
Requirement already satisfied: werkzeug>=2.0 in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (2.2.3)
Requirement already satisfied: apache-airflow-providers-common-sql in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (1.5.2)
Requirement already satisfied: apache-airflow-providers-ftp in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.4.2)
Requirement already satisfied: apache-airflow-providers-http in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (4.4.2)
Requirement already satisfied: apache-airflow-providers-imap in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.2.2)
Requirement already satisfied: apache-airflow-providers-sqlite in ./.local/lib/python3.10/site-packages (from apache-airflow==2.6.3) (3.4.2)
Processing ./plugins/snowflake_connector_python-3.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (from apache-airflow-providers-snowflake==4.2.0->-r /usr/local/airflow/requirements/requirements.txt (line 5))
Processing ./plugins/snowflake_sqlalchemy-1.4.7-py2.py3-none-any.whl (from apache-airflow-providers-snowflake==4.2.0->-r /usr/local/airflow/requirements/requirements.txt (line 5))
Processing ./plugins/mysqlclient-2.2.0.tar.gz (from apache-airflow-providers-mysql==5.1.1->-r /usr/local/airflow/requirements/requirements.txt (line 6))
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'error'
error: subprocess-exited-with-error
 
× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
Looking in links: /usr/local/airflow/plugins
ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
ERROR: No matching distribution found for wheel
[end of output]
 
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
 
× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
 
note: This error originates from a subprocess, and is likely not a problem with pip.
julio-tl commented 1 year ago

@maslick I ran into the same issue while deploying to MWAA 2.6.3 (but not locally). I modified my constraints.txt (a copy of this) and downgraded to mysqlclient==2.1.1. The issue went away. I am currently battling other dependency installation failures but just wanted to share what worked for me so far.

mayushko26 commented 11 months ago

In cases like these you'll need to manually add wheel in your requirements