JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.84k stars 5.49k forks source link

Docker image segfault in 1.9.0, but not in 1.8 #49629

Open ianfiske opened 1 year ago

ianfiske commented 1 year ago

I am looking forward to the precompilation in 1.9 being an enabler for julia execution in AWS lambda. But a quick test gives me a segfault that didn't occur in 1.8.

I am running on an M1 Mac, so this may be related to the cross-platform issues, but didn't see this in 1.8.

Here is an MWE Dockerfile that works in 1.8:

FROM --platform=linux/amd64 public.ecr.aws/lambda/provided:al2

ARG FOLDER=1.8
ARG JULIA_VERSION="1.8.5"
ARG SHA256="e71a24816e8fe9d5f4807664cbbb42738f5aa9fe05397d35c81d4c5d649b9d05" 

WORKDIR /usr/local

RUN yum install -y tar gzip

# Download the Julia x86_64 binary (only one compatible w/ AWS Lambda)
RUN curl -fL -o julia.tar.gz "https://julialang-s3.julialang.org/bin/linux/x64/${FOLDER}/julia-${JULIA_VERSION}-linux-x86_64.tar.gz"

# Check the SHA256 hash, exit if they do not match
RUN echo "${SHA256} julia.tar.gz" | sha256sum -c || exit 1

# Extract Julia and create a SymLink
RUN tar xf julia.tar.gz
RUN ln -s "julia-${JULIA_VERSION}" julia

# Install the application
WORKDIR /var/task

# LD_LIBRARY_PATH is cleared due to https://github.com/JuliaLang/julia/issues/46409
RUN LD_LIBRARY_PATH="" /usr/local/julia/bin/julia -e "println(\"hello\")"

put this in a directory and run

docker build --platform=linux/amd .

works fine.

However, with 1.9:

FROM --platform=linux/amd64 public.ecr.aws/lambda/provided:al2

ARG FOLDER=1.9
ARG JULIA_VERSION="1.9.0"
ARG SHA256="00c614466ef9809c2eb23480e38d196a2c577fff2730c4f83d135b913d473359"

WORKDIR /usr/local

RUN yum install -y tar gzip

# Download the Julia x86_64 binary (only one compatible w/ AWS Lambda)
RUN curl -fL -o julia.tar.gz "https://julialang-s3.julialang.org/bin/linux/x64/${FOLDER}/julia-${JULIA_VERSION}-linux-x86_64.tar.gz"

# Check the SHA256 hash, exit if they do not match
RUN echo "${SHA256} julia.tar.gz" | sha256sum -c || exit 1

# Extract Julia and create a SymLink
RUN tar xf julia.tar.gz
RUN ln -s "julia-${JULIA_VERSION}" julia

# Install the application
WORKDIR /var/task

# LD_LIBRARY_PATH is cleared due to https://github.com/JuliaLang/julia/issues/46409
RUN LD_LIBRARY_PATH="" /usr/local/julia/bin/julia -e "println(\"hello\")"

I get:

(base) ifiske@[hostname] julia1.9 % docker build --platform=linux/amd .
[+] Building 23.1s (11/12)                                                                                                                                              
 => [internal] load .dockerignore                                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                                    0.0s
 => [internal] load build definition from Dockerfile                                                                                                               0.0s
 => => transferring dockerfile: 1.24kB                                                                                                                             0.0s
 => [internal] load metadata for public.ecr.aws/lambda/provided:al2                                                                                                0.0s
 => [1/9] FROM public.ecr.aws/lambda/provided:al2                                                                                                                  0.0s
 => CACHED [2/9] WORKDIR /usr/local                                                                                                                                0.0s
 => CACHED [3/9] RUN yum install -y tar gzip                                                                                                                       0.0s
 => CACHED [4/9] RUN curl -fL -o julia.tar.gz "https://julialang-s3.julialang.org/bin/linux/x64/1.9/julia-1.9.0-rc3-linux-x86_64.tar.gz"                           0.0s
 => CACHED [5/9] RUN echo "d1b2b892e8596ec95cbf7495b8db7815bf7c7b0679c820ea5c8ca2f134be1a7b julia.tar.gz" | sha256sum -c || exit 1                                 0.0s
 => CACHED [6/9] RUN tar xf julia.tar.gz                                                                                                                           0.0s
 => CACHED [7/9] RUN ln -s "julia-1.9.0-rc3" julia                                                                                                                 0.0s
 => CACHED [8/9] WORKDIR /var/task                                                                                                                                 0.0s
 => [9/9] RUN LD_LIBRARY_PATH="" /usr/local/julia/bin/julia -e "println("hello")"                                                                                 23.1s
 => => # jl_repl_entrypoint at /cache/build/default-amdci4-4/julialang/julia-release-1-dot-9/src/jlapi.c:711                                                           
 => => # main at /cache/build/default-amdci4-4/julialang/julia-release-1-dot-9/cli/loader_exe.c:59                                                                     
 => => # __libc_start_main at /lib64/libc.so.6 (unknown line)                                                                                                          
 => => # unknown function (ip: 0x401098)                                                                                                                               
 => => # Allocations: 0 (Pool: 0; Big: 0); GC: 0                                                                                                                       
 => => # qemu: uncaught target signal 11 (Segmentation fault) - core dumped     
ViralBShah commented 1 year ago

Is this still an issue with the 1.9 release?

ianfiske commented 1 year ago

Yes, it is still an issue with the 1.9 release.

(edit: I've just edited the title and MWE to use 1.9.0 release instead of original rc3).

vtjnash commented 9 months ago

Does it still fail in v1.10? There don't appear to be an obvious indicators of why it would have failed from that backtrace info