Open pecigonzalo opened 3 years ago
This build works:
FROM python:3.8-slim-buster as deps
RUN apt-get update \
&& apt-get install --no-install-recommends -y \
g++ \
ninja-build cmake git-core wget \
libboost-all-dev \
unixodbc unixodbc-dev \
python-dev \
&& apt-get clean
RUN pip install --user pybind11==2.6.2 pyarrow==3.0.0
FROM deps as turbodbc
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_dataset.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_dataset.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_flight.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_flight.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python_flight.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python_flight.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libparquet.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libparquet.so
RUN ln -rvs \
/root/.local/lib/python3.8/site-packages/pyarrow/libplasma.so.300 \
/root/.local/lib/python3.8/site-packages/pyarrow/libplasma.so
RUN pip install --user turbodbc==4.2.0
But I dont know if the software will work as a commented in the linked issue has then following comment:
You will end up with random segmentation faults otherwise.
in reference to symlinking.
This also means we cant define turbodbc==4.2.0
in a requirements.txt
together with pyarrow
because we need to do a manual step in between.
The fix that was mentioned in the previous issue, is likely the one in this doc https://arrow.apache.org/docs/python/extending.html#building-extensions-against-pypi-wheels and referenced in this comment https://github.com/blue-yonder/turbodbc/issues/276#issuecomment-839689005.
I think its a bad call from pyarrow to ask consumers to modify the installation.
The documentation you linked was helpful for me. I am now able to get turbodbc up and running without conda for the first time. I am installing pyarrow in a separate RUN command with some other dependencies, then I have a line which runs the create_library_symlinks() command. Finally, the rest of my requirements (including turbodbc and airflow-providers-odbc) are installed.
RUN pip install --user --upgrade pip \
&& pip install --no-cache --user \
python-snappy \
pybind11 \
numpy \
pyarrow==5.0.0 \
apache-airflow[password,crypto]==${AIRFLOW_VERSION}
RUN python -c "import pyarrow; pyarrow.create_library_symlinks()"
RUN pip install --no-cache --user -r requirements.txt
Well, the build worked but then turbodbc was not able to find pyarrow during actual tasks. Both libraries are installed in the same environment. I will try @pecigonzalo's approach with symlinks
I know this works with conda, but I want to move towards using the official apache/airflow image which does not use conda. The only failure is turbodbc right now.
I am facing the same issue, @idacey did you find any solution?
@xhochy I went through this : https://github.com/blue-yonder/turbodbc/issues/276 and https://github.com/blue-yonder/turbodbc/issues/227.
I'm using Ubuntu 20.04 in a windows system. Any help would be great. Thanks a lot
Negative. I ended up installing with mamba instead and used a package called conda-pack to avoid having conda installed in my final image.
COPY ${ENV_FILE} /conda-env.yml
#creates the conda environment from conda-env.yml and unpacks it to be copied from the /venv folder
RUN mamba env create -f /conda-env.yml \
&& /opt/conda/envs/airflow/bin/conda-pack --name airflow --ignore-missing-files --output /tmp/env.tar.gz \
&& mkdir -p ${VIRTUAL_ENV} \
&& cd ${VIRTUAL_ENV} \
&& tar -xvf /tmp/env.tar.gz \
&& rm /tmp/env.tar.gz \
&& ${VIRTUAL_ENV}/bin/conda-unpack \
&& conda clean -afy
WORKDIR ${VIRTUAL_ENV}
My final image copies my venv folder which results in a working pyarrow without anaconda installed .
COPY --chown=airflow:root --from=python-dependencies /venv /venv
I am still hoping for the day when I can pip install everything since a chunk of my most important libraries are not on conda at all.
I am facing the same issue, @idacey did you find any solution?
@xhochy I went through this : #276 and #227.
I'm using Ubuntu 20.04 in a windows system. Any help would be great. Thanks a lot
@DevangB9 I recently was able to solve this issue and posted it in this comment.
As already reflected in https://github.com/blue-yonder/turbodbc/issues/276 compiling from source fails to find
pyarrow
outside ofconda
.This is using
pyarrow
installation from wheels.Reproduce in
I dont understand why https://github.com/blue-yonder/turbodbc/issues/276 was closed as many users are reporting the exact same issue. The issue is likely due to
pyarrow
.so
files being suffixed with.300
for version3.0.0
and so on.The following comment (which links to the actual comments) compiling from source is mentioned as symlinking the names will not work, but its not clear what needs to be compiled.
Sample error output: