Closed MrThomasWagner closed 2 years ago
Facing the same issue. Is there a workaround for this ?
I'm not quite sure why this started happening - will have to do some tests later this week.
Do you need the --constraint
in there?
Yea it does work ok without the constraints flag for my proof of concept - I have some more dependencies I'm going to want to add in the future and would like to be able to include it.
Awesome plugin btw
Unfortunately EMR Serverless requires a newer version of boto3 than what's in that constraints file. I don't know if there's a way to override that...
I noticed it doesn't conflict with Airflow 2.4.2 which is out - MWAA is just a little behind on that. I.e.
https://raw.githubusercontent.com/apache/airflow/constraints-2.4.2/constraints-3.7.txt
Yup, MWAA is still on 2.2.2. I'm curious, can you help me understand why you're including the constraints line? I know you kind of mentioned it, but I'm still not sure what it's used for / why it's needed?
I was following this best practices guide in the MWAA docs: https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-dependencies.html
There is an Option 2 there using wheel fwiw.. maybe I'll look into that if 2.2.2 is SOL
Ahhh got it thank you. Yea, the boto3 will be an issue just because of when EMR Serverless support was added to it.
@dacort thank you for response. Curious, what features of boto3>=1.23.9 and ~=1.23
are in use by emr serverless operators and sensors that are not present in boto3==1.18.65
? We for example are using MWAA 2.2.2 on our project and EMR Serverless 6.7.0, and can not use the library because of this boto issue.
@marknorkin EMR Serverless was made generally available this year, and boto3 1.23.9 is when support for EMR Serverless was added. You can still use the Operator on MWAA 2.2.2, you just need to upgrade boto3 (which will happen automatically if you use the Operator from this repo).
I wasn't aware of the recommendation in our docs to add the constraints line to the requirements.txt
- that said, I've tried this operator with the upgraded boto3 with MWAA and haven't seen any issues.
Going to close this for now as EMR Serverless requires a newer version of boto3. If you're willing to forego the constraints, you can still use the operator on MWAA, but I don't think there's a workaround. The Operator is in use in MWAA environments.
For reference, this is the dependency tree of the EMR Serverless operator. You could potentially update the constraints file with the relevant versions...or I do see now that there is a constraints-no-provider
file as well. Maybe that'll help if the concern is preventing against upgrade of core libraries for Airflow?
https://raw.githubusercontent.com/apache/airflow/constraints-2.4.2/constraints-no-providers-3.7.txt
emr-serverless==1.0.1
- boto3 [required: ~=1.23,>=1.23.9, installed: 1.26.10]
- botocore [required: >=1.29.10,<1.30.0, installed: 1.29.10]
- jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
- python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
- six [required: >=1.5, installed: 1.16.0]
- urllib3 [required: >=1.25.4,<1.27, installed: 1.26.12]
- jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
- s3transfer [required: >=0.6.0,<0.7.0, installed: 0.6.0]
- botocore [required: >=1.12.36,<2.0a.0, installed: 1.29.10]
- jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
- python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
- six [required: >=1.5, installed: 1.16.0]
- urllib3 [required: >=1.25.4,<1.27, installed: 1.26.12]
Hello, even without constraints files, we are having this issue on a new MWAA 2.2.2 environment. Our only peculiarity is that we are hosting your released .zip file in our nexus repository (the file is unmodified):
adding trusted host: 'nexus.REDACTED' (from line 1 of /usr/local/airflow/requirements/requirements.txt)
adding trusted host: 'nexusmaster.REDACTED' (from line 2 of /usr/local/airflow/requirements/requirements.txt)
Looking in indexes: https://nexus.REDACTED/repository/pypi-public/simple/
Collecting emr_serverless@ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip
Downloading https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip (6.7 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting boto3>=1.23.9,~=1.23
Downloading https://nexus.REDACTED/repository/pypi-public/packages/boto3/1.26.15/boto3-1.26.15-py3-none-any.whl (132 kB)
Collecting s3transfer<0.7.0,>=0.6.0
Downloading https://nexus.REDACTED/repository/pypi-public/packages/s3transfer/0.6.0/s3transfer-0.6.0-py3-none-any.whl (79 kB)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in ./.local/lib/python3.7/site-packages (from boto3>=1.23.9,~=1.23->emr_serverless@ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (0.10.0)
Collecting botocore<1.30.0,>=1.29.15
Downloading https://nexus.REDACTED/repository/pypi-public/packages/botocore/1.29.15/botocore-1.29.15-py3-none-any.whl (9.9 MB)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in ./.local/lib/python3.7/site-packages (from botocore<1.30.0,>=1.29.15->boto3>=1.23.9,~=1.23->emr_serverless@ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (2.8.2)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in ./.local/lib/python3.7/site-packages (from botocore<1.30.0,>=1.29.15->boto3>=1.23.9,~=1.23->emr_serverless@ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (1.26.7)
Requirement already satisfied: six>=1.5 in ./.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.30.0,>=1.29.15->boto3>=1.23.9,~=1.23->emr_serverless@ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip->-r /usr/local/airflow/requirements/requirements.txt (line 5)) (1.16.0)
Building wheels for collected packages: emr-serverless
Building wheel for emr-serverless (setup.py): started
Building wheel for emr-serverless (setup.py): finished with status 'done'
Created wheel for emr-serverless: filename=emr_serverless-1.0.1-py3-none-any.whl size=7414 sha256=da8ce9ab8a2ff91d9a3b883ddaafbc3c9e892133a4ffb499e420236b70068f0f
Stored in directory: /tmp/pip-ephem-wheel-cache-lpa7pkzp/wheels/13/92/50/475b17c65c8d67d0c9ecba04a3df4e16188d880c57c8d90d8f
Successfully built emr-serverless
Installing collected packages: botocore, s3transfer, boto3, emr-serverless
Attempting uninstall: botocore
Found existing installation: botocore 1.21.65
Uninstalling botocore-1.21.65:
Successfully uninstalled botocore-1.21.65
Attempting uninstall: s3transfer
Found existing installation: s3transfer 0.5.0
Uninstalling s3transfer-0.5.0:
Successfully uninstalled s3transfer-0.5.0
Attempting uninstall: boto3
Found existing installation: boto3 1.18.65
Uninstalling boto3-1.18.65:
Successfully uninstalled boto3-1.18.65
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-airflow-providers-amazon 2.4.0 requires boto3<1.19.0,>=1.15.0, but you have boto3 1.26.15 which is incompatible.
apache-airflow-providers-amazon 2.4.0 requires watchtower~=1.0.6, but you have watchtower 2.0.1 which is incompatible.
Successfully installed boto3-1.26.15 botocore-1.29.15 emr-serverless-1.0.1 s3transfer-0.6.0
Our requirements file is as follows:
--trusted-host nexus.REDACTED
--trusted-host nexusmaster.REDACTED
--index https://nexus.REDACTED/repository/pypi-public/
--index-url https://nexus.REDACTED/repository/pypi-public/simple/
emr_serverless @ https://nexusmaster.REDACTED/repository/REDACTED/REDACTED/mwaa_plugin.zip
As a bit of an aside, we have tried getting around this by setting this:
apache-airflow==2.2.2
apache-airflow-providers-amazon>=v5.1.0
This solves the version issue and install works correctly everywhere except on WebServer (as in https://repost.aws/questions/QUmgPhWhgmTFGMc18d7De40A/airflow-webserver-not-installing-python-requirements). However, if we set this and then try to use the operator in a DAG, the DAG gets processed correctly, but we never get a Task to actually run. We have also tried this with different versions of apache-airflow-providers-amazon (3.1.1, 5.1.0, 6.0.0). In the latter case we removed mwaa_plugin.zip as the library itself should already be providing the operator. We are unsure of the reason why this is not working (it may be our fault), hence why we are not opening a new issue yet.
In any case, we just wanted to let you know that just setting the emr_serverless requirement is not working for us, even without constraints.
@dlecina Interesting, thank you for all the detail. I know the MWAA team has been doing some work on Python requirements lately so I wonder if something changed here.
I will try to reproduce this and reopen this if I run into the same. Between the US holiday this week and re:Invent next week it may take me a bit, but I'll try to take a look ASAP.
Thanks @dacort! Yes, I expect there have been some changes in the background that explain the different behavior.
In case it's helpful to anyone, in the end the following combination seemed to work for us; we were able to reach EMR Serverless with this:
--trusted-host nexus.REDACTED
--trusted-host nexusmaster.REDACTED
--index https://nexus.REDACTED/repository/pypi-public/
--index-url https://nexus.REDACTED/repository/pypi-public/simple/
apache-airflow==2.2.2
apache-airflow-providers-amazon==6.0.0
boto>=1.23.9
Context:
Setting apache-airflow-providers-amazon==6.1.0
would be ideal, as it has the correct boto requirement, but then it demands apache-airflow>=2.3.0
, which does not work with MWAA 2.2.2, so instead we set boto explicitely and that seemed to work as it does not conflict with either library. Not setting boto explicitely does not work in this configuration because, despite 6.0.0 having the EMR Serverless Operator, the boto requirement is set to an older version which does not have the emr-serverless API and it will fail when running the task.
In short:
apache-airflow-providers-amazon==6.1.0
-> apache-airflow>=2.3.0 ❌ boto3>=1.24.0 ✔️
apache-airflow-providers-amazon==6.0.0
-> apache-airflow>=2.2.0 ✔️ boto3>=1.15.0 ❌
apache-airflow-providers-amazon==6.0.0 + boto>=1.23.9
-> apache-airflow>=2.2.0 ✔️ boto>=1.23.9 ✔️
Just to confirm, I was still able to use MWAA 2.2.2 with the release from this repository without a problem.
My requirements file is just this plugin, though.
emr_serverless @ https://github.com/aws-samples/emr-serverless-samples/releases/download/v1.0.1/mwaa_plugin.zip
I'm using the CDK stack from this repository.
I'll also try with the constraints-no-provider
file as well and see if that works.
Hi all,
I'm trying to use the latest release of the serverless plugin on MWAA with Airflow version 2.2.2: https://github.com/aws-samples/emr-serverless-samples/releases/tag/v1.0.1
The install is in conflict with the airflow v2.2.2 constraints file found here: https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt
Steps to reproduce
Requirements.txt contents:
Run:
Output: