astronomer / airflow-provider-kafka

A provider package for kafka
Apache License 2.0
37 stars 16 forks source link

How to install the provider on a docker image #10

Closed ginwakeup closed 2 years ago

ginwakeup commented 2 years ago

Hi!

I am really interested in the project and I was trying to give it ago, but I can't get passed an installation issue.

My Airflow instance is currently executed in Docker, and I have a custom Dockerfile in which I simply pip install the dependencies:

FROM apache/airflow:2.3.3 as base
USER root
COPY requirements /opt/requirements

FROM base as dev
# Later on we might need to COPY dag and tasks folders. For now we are mounting them to speed up dev workflow.

############ Build Dependencies       #################

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
       build-essential libopenmpi-dev libssl-dev python3-dev libkrb5-dev \
 && apt-get autoremove -yqq --purge \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

USER airflow

RUN pip install --user -r /opt/requirements/dev.txt

FROM apache/airflow:2.3.3 as prod

USER airflow

RUN sudo pip install --user -r /opt/requirements/prod.txt

In my dev.txt requirements file:

airflow-provider-kafka

Unfortunately, due to the fact that Airflow is not executed as root, and therefore python packages are not installed with root user, I am getting the following error:

#8 5.929   Running setup.py install for confluent-kafka: started
#8 6.357   Running setup.py install for confluent-kafka: finished with status 'error'
#8 6.361   error: subprocess-exited-with-error
#8 6.361   
#8 6.361   × Running setup.py install for confluent-kafka did not run successfully.
#8 6.361   │ exit code: 1
#8 6.361   ╰─> [53 lines of output]
#8 6.361       running install
#8 6.361       running build
#8 6.361       running build_py
#8 6.361       creating build
#8 6.361       creating build/lib.linux-aarch64-3.7
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka
#8 6.361       copying src/confluent_kafka/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#8 6.361       copying src/confluent_kafka/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#8 6.361       copying src/confluent_kafka/serializing_producer.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#8 6.361       copying src/confluent_kafka/deserializing_consumer.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/serialization
#8 6.361       copying src/confluent_kafka/serialization/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/serialization
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#8 6.361       copying src/confluent_kafka/kafkatest/verifiable_consumer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#8 6.361       copying src/confluent_kafka/kafkatest/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#8 6.361       copying src/confluent_kafka/kafkatest/verifiable_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#8 6.361       copying src/confluent_kafka/kafkatest/verifiable_producer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/avro
#8 6.361       copying src/confluent_kafka/avro/load.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#8 6.361       copying src/confluent_kafka/avro/cached_schema_registry_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#8 6.361       copying src/confluent_kafka/avro/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#8 6.361       copying src/confluent_kafka/avro/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/json_schema.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/schema_registry_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/avro.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       copying src/confluent_kafka/schema_registry/protobuf.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/admin
#8 6.361       copying src/confluent_kafka/admin/_resource.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#8 6.361       copying src/confluent_kafka/admin/_config.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#8 6.361       copying src/confluent_kafka/admin/_acl.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#8 6.361       copying src/confluent_kafka/admin/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#8 6.361       creating build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#8 6.361       copying src/confluent_kafka/avro/serializer/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#8 6.361       copying src/confluent_kafka/avro/serializer/message_serializer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#8 6.361       running build_ext
#8 6.361       building 'confluent_kafka.cimpl' extension
#8 6.361       creating build/temp.linux-aarch64-3.7
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka
#8 6.361       creating build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka/src
#8 6.361       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.7m -c /tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka/src/confluent_kafka.c -o build/temp.linux-aarch64-3.7/tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka/src/confluent_kafka.o
#8 6.361       In file included from /tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka/src/confluent_kafka.c:17:
#8 6.361       /tmp/pip-install-bo__6h37/confluent-kafka_20b5df8e0d5c4e71a8593c527fe8aeb8/src/confluent_kafka/src/confluent_kafka.h:23:10: fatal error: librdkafka/rdkafka.h: No such file or directory
#8 6.361          23 | #include <librdkafka/rdkafka.h>
#8 6.361             |          ^~~~~~~~~~~~~~~~~~~~~~
#8 6.361       compilation terminated.
#8 6.361       error: command 'gcc' failed with exit status 1
#8 6.361       [end of output]
#8 6.361   
#8 6.361   note: This error originates from a subprocess, and is likely not a problem with pip.
#8 6.362 error: legacy-install-failure
#8 6.362 
#8 6.362 × Encountered error while trying to install package.
#8 6.362 ╰─> confluent-kafka
#8 6.362 
#8 6.362 note: This is an issue with the package mentioned above, not pip.
#8 6.362 hint: See above for output from the failure.
#8 6.552 
#8 6.552 [notice] A new release of pip available: 22.1.2 -> 22.2.2
#8 6.552 [notice] To update, run: python -m pip install --upgrade pip
------
executor failed running [/bin/bash -o pipefail -o errexit -o nounset -o nolog -c pip install --user -r /opt/requirements/dev.txt]: exit code: 1
ERROR: Service 'airflow-init' failed to build : Build failed

Any idea on how I could get past this? Have you tried the Kafka provider in a Docker Airflow container?

Thanks!

dylanbstorey commented 2 years ago

you need to install gcc into the container I believe apt install build-essential should do it.

ginwakeup commented 2 years ago

you need to install gcc into the container I believe apt install build-essential should do it.

Hi @dylanbstorey , thanks for the answer, unfortunately I already have build-essential and gcc.

The error is caused by confluent-kafka not being able to find librdkafka. This is the command I am running to install it in the docker image:


RUN apt-get update \
  && apt-get install -y --no-install-recommends \
        librdkafka-dev build-essential libopenmpi-dev libssl-dev python3-dev libkrb5-dev \
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

but still I get the following error:

#7 8.055   Running setup.py clean for confluent-kafka
#7 8.425 Failed to build confluent-kafka
#7 8.960 Installing collected packages: confluent-kafka, asgiref, airflow-provider-kafka
#7 8.962   Running setup.py install for confluent-kafka: started
#7 9.423   Running setup.py install for confluent-kafka: finished with status 'error'
#7 9.427   error: subprocess-exited-with-error
#7 9.427   
#7 9.427   × Running setup.py install for confluent-kafka did not run successfully.
#7 9.427   │ exit code: 1
#7 9.427   ╰─> [52 lines of output]
#7 9.427       running install
#7 9.427       running build
#7 9.427       running build_py
#7 9.427       creating build
#7 9.427       creating build/lib.linux-aarch64-3.7
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka
#7 9.427       copying src/confluent_kafka/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#7 9.427       copying src/confluent_kafka/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#7 9.427       copying src/confluent_kafka/serializing_producer.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#7 9.427       copying src/confluent_kafka/deserializing_consumer.py -> build/lib.linux-aarch64-3.7/confluent_kafka
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/serialization
#7 9.427       copying src/confluent_kafka/serialization/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/serialization
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#7 9.427       copying src/confluent_kafka/kafkatest/verifiable_consumer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#7 9.427       copying src/confluent_kafka/kafkatest/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#7 9.427       copying src/confluent_kafka/kafkatest/verifiable_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#7 9.427       copying src/confluent_kafka/kafkatest/verifiable_producer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/kafkatest
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/avro
#7 9.427       copying src/confluent_kafka/avro/load.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#7 9.427       copying src/confluent_kafka/avro/cached_schema_registry_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#7 9.427       copying src/confluent_kafka/avro/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#7 9.427       copying src/confluent_kafka/avro/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/json_schema.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/error.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/schema_registry_client.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/avro.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       copying src/confluent_kafka/schema_registry/protobuf.py -> build/lib.linux-aarch64-3.7/confluent_kafka/schema_registry
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/admin
#7 9.427       copying src/confluent_kafka/admin/_resource.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#7 9.427       copying src/confluent_kafka/admin/_config.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#7 9.427       copying src/confluent_kafka/admin/_acl.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#7 9.427       copying src/confluent_kafka/admin/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/admin
#7 9.427       creating build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#7 9.427       copying src/confluent_kafka/avro/serializer/__init__.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#7 9.427       copying src/confluent_kafka/avro/serializer/message_serializer.py -> build/lib.linux-aarch64-3.7/confluent_kafka/avro/serializer
#7 9.427       running build_ext
#7 9.427       building 'confluent_kafka.cimpl' extension
#7 9.427       creating build/temp.linux-aarch64-3.7
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka
#7 9.427       creating build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka/src
#7 9.427       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.7m -c /tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka/src/confluent_kafka.c -o build/temp.linux-aarch64-3.7/tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka/src/confluent_kafka.o
#7 9.427       In file included from /tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka/src/confluent_kafka.c:17:
#7 9.427       /tmp/pip-install-ggcgw7lz/confluent-kafka_00fcd73d925244dbbeaef64c302d8657/src/confluent_kafka/src/confluent_kafka.h:66:2: error: #error "confluent-kafka-python requires librdkafka v1.9.0 or later. Install the latest version of librdkafka from the Confluent repositories, see http://docs.confluent.io/current/installation.html"
#7 9.427          66 | #error "confluent-kafka-python requires librdkafka v1.9.0 or later. Install the latest version of librdkafka from the Confluent repositories, see http://docs.confluent.io/current/installation.html"
#7 9.427             |  ^~~~~
#7 9.427       error: command 'gcc' failed with exit status 1
#7 9.427       [end of output]
dylanbstorey commented 2 years ago

confluent-kafka-python requires librdkafka v1.9.0 or later. Install the latest version of librdkafka from the Confluent repositories, see http://docs.confluent.io/current/installation.html

From your logs appears to provide some clues on your next steps.

https://github.com/astronomer/airflow-provider-kafka/blob/main/dev/Dockerfile also works.

ginwakeup commented 2 years ago

Thanks for the reply @dylanbstorey . In the end I noticed I was installing it, but it couldn't find the librdkafka for some linking issues that seem to happen in Debian based images (and the docker hub airflow image should Debian based, if I am not wrong).

I've tried the dev image in your repository and it works, but I am not sure why. I can see it uses a different image, from quay: quay.io/astronomer/ap-airflow:2.2.3 maybe this is the cause.

From what I know the airflow image from docker hub uses FROM python-slim, shouldn't this be the same? Or maybe the image in quay inherits from something else, like ubuntu?

I am not sure.

Thank you.

dylanbstorey commented 2 years ago

Id probably start by not doing a staged build to begin with.