Open ash211 opened 7 years ago
I'm also having issues with the alpine base image. E.g. I'm building on top of the driver-py image and installing my PySpark application into it. This app has -- amongst others -- pyarrow as a dependency. Pyarrow is not supported on alpine according to https://stackoverflow.com/questions/49059779/installing-pyarrow-in-alpine-docker.
I switched the base image to openjdk:8
(Debian Stretch based) and everything's working fine. Is there some deeper reason in using the alpine base besides it probably being smaller? Or would it be worth it to submit a PR changing it to the Debian one?
FYI for others, we saw the following issue when running our application with the stock Spark
Dockerfiles
that are built onopenjdk:8-alpine
There's some activity online around tracking down issues between the OpenJDK and the version of
libc
used in the alpine base images --musl
. It seems likemusl
isn't fully compatible with what the OpenJDK expectsWe fixed this by switching the base image of the Spark images to an internally-created image based on alpine plus the glibc package from https://github.com/andyshinn/alpine-pkg-glibc