coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
45 stars 12 forks source link

PIP additional requirements throws Docker error #239

Open jrlegrand opened 8 months ago

jrlegrand commented 8 months ago

Problem Statement

Right now, we have CMS Part D DAG in a "hidden_dags" folder because it was causing an error for me and at least one other person when trying to docker-compose up airflow-init.

Also - I have another branch with a MEPS DAG (jrlegrand/meps) that I haven't merged because it also has a dependency on a PIP package.

So figuring out the way to handle this error would unlock two very useful DAGs.

Assuming the answer is to build a custom image like the error message says... https://airflow.apache.org/docs/docker-stack/build.html

airflow-init  | !!!!!  Installing additional requirements: 'zipfile-deflate64' !!!!!!!!!!!!
airflow-init  |
airflow-init  | WARNING: This is a development/test feature only. NEVER use it in production!
airflow-init  |          Instead, build a custom image as described in
airflow-init  |
airflow-init  |          https://airflow.apache.org/docs/docker-stack/build.html

Criteria for Success

Figure out a way to safely and correctly handle PIP dependencies.

Additional Information

This is what I had to do to fix the error (basically crippling the DAG that depended on this PIP package). image

This is the Slack convo with Adam G. https://coderx.slack.com/archives/C05S27E52N8/p1703102061867119

Full error log ```

C:\Dev\sagerx>docker-compose up airflow-init time="2023-12-20T15:21:08-05:00" level=warning msg="The \"UMLS_API\" variable is not set. Defaulting to a blank string."time="2023-12-20T15:21:08-05:00" level=warning msg="The \"UMLS_API\" variable is not set. Defaulting to a blank string."time="2023-12-20T15:21:08-05:00" level=warning msg="The \"UMLS_API\" variable is not set. Defaulting to a blank string."time="2023-12-20T15:21:08-05:00" level=warning msg="The \"UMLS_API\" variable is not set. Defaulting to a blank string."[+] Running 2/0 ✔ Container postgres Running 0.0s ✔ Container airflow-init Created 0.0s Attaching to airflow-init, postgres airflow-init | The container is run as root user. For security, consider using a regular user account. airflow-init | airflow-init | airflow-init | /home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) airflow-init | DB: postgresql+psycopg2://airflow:@postgres:5432/airflow airflow-init | Performing upgrade with database postgresql+psycopg2://airflow:@postgres:5432/airflow airflow-init | [2023-12-20 20:21:10,566] {migration.py:205} INFO - Context impl PostgresqlImpl. airflow-init | [2023-12-20 20:21:10,567] {migration.py:212} INFO - Will assume transactional DDL. airflow-init | [2023-12-20 20:21:10,572] {db.py:1571} INFO - Creating tables airflow-init | INFO [alembic.runtime.migration] Context impl PostgresqlImpl. airflow-init | INFO [alembic.runtime.migration] Will assume transactional DDL. airflow-init | Upgrades done airflow-init | [2023-12-20 20:21:13,342] {providers_manager.py:238} INFO - Optional provider feature disabled when importing 'airflow.providers.google.leveldb.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package airflow-init | [2023-12-20 20:21:13,666] {providers_manager.py:238} INFO - Optional provider feature disabled when importing 'airflow.providers.google.leveldb.hooks.leveldb.LevelDBHook' from 'apache-airflow-providers-google' package airflow-init | airflow already exist in the db airflow-init | airflow-init | !!!!! Installing additional requirements: 'zipfile-deflate64' !!!!!!!!!!!! airflow-init | airflow-init | WARNING: This is a development/test feature only. NEVER use it in production! airflow-init | Instead, build a custom image as described in airflow-init | airflow-init | https://airflow.apache.org/docs/docker-stack/build.html airflow-init | airflow-init | Adding requirements at container startup is fragile and is done every time airflow-init | the container starts, so it is onlny useful for testing and trying out airflow-init | of adding dependencies. airflow-init | airflow-init | airflow-init | airflow-init | You are running pip as root. Please use 'airflow' user to run pip! airflow-init | airflow-init | See: https://airflow.apache.org/docs/docker-stack/build.html#adding-a-new-pypi-package airflow-init | airflow-init | airflow-init exited with code 1

RiversPharmD commented 8 months ago

@jrlegrand you already have a custom docker image, adding the dbt-postgres adapter. I'll open a branch and see if I can get it running for you