astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
347 stars 43 forks source link

Dependency issue reported by @guohui-gao #607

Closed tatiana closed 2 years ago

tatiana commented 2 years ago

Describe the bug Dependency issue when installing several external packages in conjunction with astro-sdk-python==1.0.0b1. This problem was reported by @guohui-gao.

Version

To Reproduce Guohui is using the following Docker base image:

FROM quay.io/astronomer/astro-runtime:5.0.6

And this is his requirements.txt file:

airflow-provider-fivetran==1.1.1
apache-airflow-providers-salesforce==4.0.0
apache-airflow-providers-snowflake==3.0.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-google==8.1.0
astro-sdk-python==1.0.0b1
astronomer-providers==1.6.0
datapackage==1.15.2
flake8==4.0.1
flake8-docstrings==1.6.0
google-analytics-data==0.12.1
gusty==0.11.2
mail-parser==3.15.0
mock==4.0.3
pytest==7.1.2
requests-mock==1.9.3
scipy==1.8.1
splunk-sdk==1.7.0

He received this error:

+ grep -Eqx 'apache-airflow\s*[=~>]{1,2}.*' requirements.txt
+ pip install --no-cache-dir -q -r requirements.txt
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    scipy==1.8.1 from https://files.pythonhosted.org/packages/25/82/da07cc3bb40554f1f82d7e24bfa7ffbfb05b50c16eb8d738ebb74b68af8f/scipy-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=70de2f11bf64ca9921fda018864c78af7147025e467ce9f4a11bc877266900a6 (from -r requirements.txt (line 17)):
        Expected sha256 70de2f11bf64ca9921fda018864c78af7147025e467ce9f4a11bc877266900a6
             Got        abb83d0d652fd221fd25037a58b3ec18160a509af2c4255d825d2fa43b027641

Acceptance criteria

sunank200 commented 2 years ago

I tried the same order as mentioned and the following order in requirements.txt This worked fine for me.

scipy==1.8.1
astro-sdk-python==1.0.0b1
astronomer-providers==1.6.0
apache-airflow-providers-google==8.1.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-snowflake==3.0.0
apache-airflow-providers-salesforce==4.0.0
airflow-provider-fivetran==1.1.1
datapackage==1.15.2
flake8==4.0.1
flake8-docstrings==1.6.0
google-analytics-data==0.12.1
gusty==0.11.2
mail-parser==3.15.0
mock==4.0.3
pytest==7.1.2
requests-mock==1.9.3
splunk-sdk==1.7.0

Following is the Dockerfile I used:

FROM quay.io/astronomer/astro-runtime:5.0.6

ENV AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
Screenshot 2022-08-09 at 8 09 33 PM

Used the following command: astro dev init and astro dev start

WDYT @kaxil @tatiana @utkarsharma2 @dimberman @pankajastro @pankajkoti?

sunank200 commented 2 years ago

After installation of libraries following is the output of pip freeze

ABSQL==0.2.0
adal==1.2.7
aiobotocore==2.3.4
aiofiles==0.8.0
aiohttp==3.8.1
aioitertools==0.10.0
aiosignal==1.2.0
airflow-provider-fivetran==1.1.1
alembic==1.8.0
amqp==5.1.1
anyio==3.6.1
apache-airflow @ file:///tmp/airflow/dist/apache_airflow-2.3.3%2Bastro.1-py3-none-any.whl
apache-airflow-providers-amazon==4.0.0
apache-airflow-providers-apache-hive==3.0.0
apache-airflow-providers-apache-livy==3.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.1.0
apache-airflow-providers-databricks==3.0.0
apache-airflow-providers-elasticsearch==4.0.0
apache-airflow-providers-ftp==3.0.0
apache-airflow-providers-google==8.1.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-microsoft-azure==4.0.0
apache-airflow-providers-postgres==5.0.0
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-salesforce==4.0.0
apache-airflow-providers-snowflake==3.0.0
apache-airflow-providers-sqlite==3.0.0
apispec==3.3.2
argcomplete==2.0.0
asgiref==3.5.2
asn1crypto==1.5.1
astro-sdk-python==1.0.0b1
astronomer-airflow-scripts @ https://github.com/astronomer/astronomer-airflow-scripts/releases/download/v0.0.5/astronomer_airflow_scripts-0.0.5-py3-none-any.whl
astronomer-fab-security-manager==1.9.3
astronomer-providers==1.6.0
astronomer-runtime-extensions @ file:///tmp/wheels/astronomer_runtime_extensions-1.0.0-py3-none-any.whl
async-timeout==4.0.2
attrs==20.3.0
Authlib==1.0.1
azure-batch==12.0.0
azure-common==1.1.28
azure-core==1.24.2
azure-cosmos==4.3.0
azure-datalake-store==0.0.52
azure-identity==1.10.0
azure-keyvault-secrets==4.4.0
azure-kusto-data==0.0.45
azure-mgmt-containerinstance==1.5.0
azure-mgmt-core==1.3.1
azure-mgmt-datafactory==1.1.0
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-resource==21.1.0
azure-nspkg==3.0.2
azure-storage-blob==12.8.1
azure-storage-common==2.1.0
azure-storage-file==2.1.0
Babel==2.10.3
backoff==2.1.2
bcrypt==3.2.2
beautifulsoup4==4.11.1
billiard==3.6.4.0
bitarray==2.5.1
blinker==1.4
boto3==1.21.21
botocore==1.24.21
cached-property==1.5.2
cachelib==0.9.0
cachetools==4.2.2
cattrs==1.10.0
celery==5.2.7
certifi==2020.12.5
cffi==1.15.1
chardet==4.0.0
charset-normalizer==2.0.12
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
clickclick==20.10.2
colorama==0.4.5
colorlog==4.8.0
commonmark==0.9.1
connexion==2.14.0
cron-descriptor==1.2.30
croniter==1.3.5
cryptography==36.0.2
databricks-sql-connector==2.0.2
datapackage==1.15.2
db-dtypes==1.0.2
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.13
dill==0.3.1.1
distlib==0.3.4
dnspython==2.2.1
docutils==0.19
elasticsearch==7.13.4
elasticsearch-dbapi==0.2.9
elasticsearch-dsl==7.4.0
email-validator==1.2.1
et-xmlfile==1.1.0
exchange-calendars==4.1.1
fastjsonschema==2.16.1
filelock==3.7.1
flake8==4.0.1
flake8-docstrings==1.6.0
Flask==2.1.2
Flask-AppBuilder==4.1.2
Flask-Babel==2.0.0
Flask-Bcrypt==1.0.1
Flask-Caching==2.0.0
Flask-JWT-Extended==4.4.2
Flask-Login==0.6.1
Flask-Session==0.4.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.15.1
flatdict==4.0.1
flower==1.1.0
frozenlist==1.3.0
future==0.18.2
gcloud-aio-auth==4.0.1
gcloud-aio-bigquery==6.0.0
gcloud-aio-storage==7.0.1
google-ads==17.0.0
google-analytics-data==0.12.1
google-api-core==2.8.2
google-api-python-client==1.12.11
google-auth==2.9.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.5.2
google-cloud-aiplatform==1.15.0
google-cloud-appengine-logging==1.1.2
google-cloud-audit-log==0.2.2
google-cloud-automl==2.7.3
google-cloud-bigquery==2.34.4
google-cloud-bigquery-datatransfer==3.6.2
google-cloud-bigquery-storage==2.13.2
google-cloud-bigtable==1.7.2
google-cloud-build==3.8.3
google-cloud-container==2.10.8
google-cloud-core==2.3.1
google-cloud-datacatalog==3.8.1
google-cloud-dataplex==1.0.1
google-cloud-dataproc==4.0.3
google-cloud-dataproc-metastore==1.5.1
google-cloud-dlp==1.0.2
google-cloud-kms==2.11.2
google-cloud-language==1.3.2
google-cloud-logging==3.1.2
google-cloud-memcache==1.3.2
google-cloud-monitoring==2.9.2
google-cloud-orchestration-airflow==1.3.2
google-cloud-os-login==2.6.2
google-cloud-pubsub==2.13.0
google-cloud-redis==2.8.1
google-cloud-resource-manager==1.5.1
google-cloud-secret-manager==1.0.2
google-cloud-spanner==1.19.3
google-cloud-speech==1.3.4
google-cloud-storage==1.44.0
google-cloud-tasks==2.9.1
google-cloud-texttospeech==1.0.3
google-cloud-translate==1.7.2
google-cloud-videointelligence==1.16.3
google-cloud-vision==1.0.2
google-cloud-workflows==1.6.3
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.3
graphviz==0.20
greenlet==1.1.2
grpc-google-iam-v1==0.12.4
grpcio==1.47.0
grpcio-gcp==0.2.2
grpcio-status==1.47.0
gunicorn==20.1.0
gusty==0.11.2
h11==0.12.0
hmsclient==0.1.1
httpcore==0.15.0
httplib2==0.20.4
httpx==0.23.0
humanize==4.2.3
idna==3.3
ijson==3.1.4
importlib-metadata==4.12.0
impyla==0.16.3
inflection==0.5.1
iniconfig==1.1.1
isodate==0.6.1
itsdangerous==2.1.2
Jinja2==3.1.2
jmespath==0.10.0
json-merge-patch==0.2
jsonlines==3.1.0
jsonpath-ng==1.5.3
jsonpointer==2.3
jsonschema==4.6.1
jupyter-core==4.11.1
jupytext==1.14.1
jwcrypto==1.3.1
kombu==5.2.4
korean-lunar-calendar==0.2.1
kubernetes==23.6.0
kubernetes-asyncio==23.6.0
lazy-object-proxy==1.7.1
linear-tsv==1.1.0
linkify-it-py==2.0.0
lockfile==0.12.2
looker-sdk==22.4.0
lxml==4.9.1
mail-parser==3.15.0
Mako==1.2.1
Markdown==3.3.7
markdown-it-py==2.1.0
MarkupSafe==2.0.1
marshmallow==3.17.0
marshmallow-enum==1.5.1
marshmallow-oneofschema==3.0.1
marshmallow-sqlalchemy==0.26.1
mccabe==0.6.1
mdit-py-plugins==0.3.0
mdurl==0.1.1
mock==4.0.3
msal==1.18.0
msal-extensions==1.0.0
msrest==0.7.1
msrestazure==0.6.4
multidict==6.0.2
mypy-boto3-rds==1.24.23
mypy-boto3-redshift-data==1.24.11.post3
nbformat==5.4.0
numpy==1.22.4
oauthlib==3.2.0
openlineage-airflow==0.10.0
openlineage-integration-common==0.10.0
openlineage-python==0.10.0
openlineage_sql==0.10.0
openpyxl==3.0.10
oscrypto==1.3.0
packaging==21.3
pandas==1.3.5
pandas-gbq==0.17.6
pandas-market-calendars==3.5
paramiko==2.11.0
pathspec==0.9.0
pendulum==2.1.2
platformdirs==2.5.2
pluggy==1.0.0
ply==3.11
portalocker==2.5.1
prison==0.2.1
prometheus-client==0.14.1
prompt-toolkit==3.0.30
proto-plus==1.19.6
protobuf==3.20.0
psutil==5.9.1
psycopg2-binary==2.9.3
pure-sasl==0.6.2
py==1.11.0
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.8.0
pycparser==2.21
pycryptodomex==3.15.0
pydata-google-auth==1.4.0
pydocstyle==6.1.1
pyflakes==2.4.0
Pygments==2.12.0
PyHive==0.6.5
PyJWT==2.4.0
pyluach==2.0.0
PyNaCl==1.5.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.18.1
pytest==7.1.2
python-daemon==2.3.0
python-dateutil==2.8.2
python-frontmatter==1.0.0
python-nvd3==0.15.0
python-slugify==6.1.2
pytz==2022.1
pytzdata==2020.1
PyYAML==6.0
redis==3.5.3
redshift-connector==2.0.908
requests==2.28.0
requests-file==1.5.1
requests-mock==1.9.3
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
rfc3986==1.5.0
rich==12.4.4
rsa==4.8
s3transfer==0.5.2
sasl==0.3.1
scipy==1.8.1
scramp==1.4.1
setproctitle==1.2.3
simple-salesforce==1.12.1
simplejson==3.17.6
six==1.16.0
smart-open==6.0.0
sniffio==1.2.0
snowballstemmer==2.2.0
snowflake-connector-python==2.7.9
snowflake-sqlalchemy==1.3.4
soupsieve==2.3.2.post1
splunk-sdk==1.7.0
sql-metadata==2.6.0
SQLAlchemy==1.4.27
sqlalchemy-bigquery==1.4.4
SQLAlchemy-JSONField==1.0.0
sqlalchemy-redshift==0.8.9
SQLAlchemy-Utils==0.38.2
sqlparse==0.4.2
statsd==3.3.0
swagger-ui-bundle==0.0.9
tableauserverclient==0.19.0
tableschema==1.20.2
tabulate==0.8.10
tabulator==1.53.5
tenacity==8.0.1
termcolor==1.1.0
text-unidecode==1.3
thrift==0.16.0
thrift-sasl==0.4.3
thriftpy2==0.4.14
toml==0.10.2
tomli==2.0.1
toolz==0.12.0
tornado==6.2
traitlets==5.3.0
typing_extensions==4.3.0
uc-micro-py==1.0.1
unicodecsv==0.14.1
uritemplate==3.0.1
urllib3==1.26.9
vine==5.0.0
virtualenv==20.15.1
watchtower==2.0.1
wcwidth==0.2.5
websocket-client==1.3.3
Werkzeug==2.1.2
wrapt==1.14.1
WTForms==2.3.3
xlrd==2.0.1
yarl==1.7.2
zeep==4.1.0
zipp==3.8.0

@guohui-gao is it possible to share the same from your setup?

sunank200 commented 2 years ago

I and @feluelle tried this separately in our systems using astro dev init and astro dev start using the Dockerfile and requirements.txt mentioned above. It worked fine.

tatiana commented 2 years ago

Thanks a lot for investigating this, @sunank200 ! I believe @guohui-gao is on holiday and will be back on Monday, 15 August.

kaxil commented 2 years ago

I could reproduce it with the deps in the issue too, the installation went ahead fine. Maybe it was a temporary issue since you saw the issue with hashes:

+ pip install --no-cache-dir -q -r requirements.txt
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    scipy==1.8.1 from https://files.pythonhosted.org/packages/25/82/da07cc3bb40554f1f82d7e24bfa7ffbfb05b50c16eb8d738ebb74b68af8f/scipy-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=70de2f11bf64ca9921fda018864c78af7147025e467ce9f4a11bc877266900a6 (from -r requirements.txt (line 17)):
        Expected sha256 70de2f11bf64ca9921fda018864c78af7147025e467ce9f4a11bc877266900a6
             Got        abb83d0d652fd221fd25037a58b3ec18160a509af2c4255d825d2fa43b027641

I am going to close it but do let us know if you can reproduce it when you are back @guohui-gao