JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.7k stars 5.48k forks source link

Julia crashes with "free(): invalid pointer" or "double free or corruption (out)" on ubuntu 20.04.1 TPU vm #44242

Open reidsanders opened 2 years ago

reidsanders commented 2 years ago

I'm trying to run julia on a tpu-vm v3-8 using the tpu-vm-pt-1.10 image. It crashes on various operations with "free(): invalid pointer." This happens with 1.7.2 binary, the 1.6.5 LTS, conda-forge version and a similar error occurs when building from source or using conda-forge. For example

(@v1.6) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f5bc79ef3ed)
unknown function (ip: 0x7f5bc79f747b)
unknown function (ip: 0x7f5bc79f8cab)
git_mbedtls_stream_global_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
init_once at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
__pthread_once_slow at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
git_libgit2_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/error.jl:108 [inlined]
initialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:986
#164 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:971
lock at ./lock.jl:187
ensure_initialized at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:967 [inlined]
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50 [inlined]
with at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/types.jl:1156 [inlined]
getconfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:160 [inlined]
project at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:30
#generate#3 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:15
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:10 [inlined]
#generate#2 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
#generate_deprecated#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:5 [inlined]
generate_deprecated at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:4
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:670
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:405
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:386
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:550
jfptr_YY.24_45436.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
jfptr_run_interface_54737.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
#44 at ./task.jl:411
jfptr_YY.44_53285.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:834
Allocations: 2654 (Pool: 2639; Big: 15); GC: 0
Aborted (core dumped)

Machine info:

$ uname -a
Linux *********** 5.11.0-1021-gcp #23~20.04.1-Ubuntu SMP Fri Oct 1 19:04:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I did notice that trying to check the libc.so.6 info segfaulted, which makes me suspicious google is doing something strange with their glibc.

$ /lib/x86_64-linux-gnu/libc.so.6 
Segmentation fault (core dumped)

I tried to make julia, and got a related crash:

Stdlibs: ────  40.897405 seconds 59.2925%
    JULIA usr/lib/julia/sys-o.a
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)

signal (11): Segmentation fault
in expression starting at none:0
ERROR: Failed to precompile __PackagePrecompilationStatementModule [top-level] to /tmp/jl_a28ZNv/compiled/v1.9/jl_CXWiQH.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1547
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1491
 [4] top-level scope
   @ none:3
free(): invalid size

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fe05715f3ed)
unknown function (ip: 0x7fe05716747b)
unknown function (ip: 0x7fe057168cbb)
close_unit_1 at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:742
close_units at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:800

signal (11): Segmentation fault
in expression starting at none:0
ERROR: LoadError: failed process: Process(`/home/rs/tools/julia/usr/bin/julia -O0 --sysimage /home/rs/tools/julia/usr/lib/julia/sys.ji --trace-compile=/tmp/jl_a28ZNv/jl_dSBZAYY4Cb --startup-file=no -Cnative -e 'pushfirst!(DEPOT_PATH, "/tmp/jl_a28ZNv");

I found a very similar issue in discourse, but it had no replies. Github issues did not seem to have anything relevant. https://discourse.julialang.org/t/issues-with-julia-installation-on-google-tpu-vm/65783 I posted and got advice about various flags to try. I tried various combinations without success, though its not clear if the failure was for the same reason.

Output: default_options.txt USE_BINARYBUILDER=0.txt USE_BINARYBUILDER=0-USE_BINARY_BUILDER_LIBGIT2=0.txt USE_BINARYBUILDER=0-USE_SYSTEM_CURL=1.txt USE_SYSTEM_CURL=1.txt

Those using USE_BINARYBUILDER=0 failed with

configure: error: --with-nghttp2 was specified but could not find libnghttp2 pkg-config file. However specifying the path manually did not help. PKG_CONFIG_PATH=/home/rs/tools/julia/usr/lib/pkgconfig/

Has anyone had success with julia on TPU vms? Thanks!

mkitti commented 2 years ago

What's the quickest way to access the tpu-vm-pt-1.10?

reidsanders commented 2 years ago

What's the quickest way to access the tpu-vm-pt-1.10?

A command like:

gcloud alpha compute tpus tpu-vm create juliatpu \ --zone=europe-west4-a \ --accelerator-type=v2-8 \ --project=_____ \ --version=tpu-vm-pt-1.10

But I have tested it on v2-alpha with same result

magicknight commented 2 years ago

Same error on TPU-VM here. I think it relate to the memory manage library used on it. I am looking for a solution.

mkitti commented 2 years ago

@jekbradbury, do you have any insight about what may be happening here?

JosePereiraUA commented 2 years ago

Sorry, do we have an update on this? Thanks.

mkitti commented 2 years ago

Unfortunately no. Could someone produce stack traces with a Julia nightly and Julia 1.8-rc1 available at https://julialang.org/downloads/ ?

reidsanders commented 2 years ago

Same error:

(@v1.8) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7ffa2a1f126d)
unknown function (ip: 0x7ffa2a1f92fb)
unknown function (ip: 0x7ffa2a1fab2b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-1.8.0-rc1/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-1.8.0-rc1/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/error.jl:108 [inlined]
initialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:986
#162 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:185
ensure_initialized at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:50
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:50 [inlined]
with at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/types.jl:1159 [inlined]
getconfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:160 [inlined]
project at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:26
#generate#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:9
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:3
jfptr_generate_73162.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:730
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_76038.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
#49 at ./task.jl:482
jfptr_YY.49_63753.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:931
Allocations: 2903 (Pool: 2890; Big: 13); GC: 0
Aborted (core dumped)

With nightly:

(@v1.9) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fd85966526d)
unknown function (ip: 0x7fd85966d2fb)
unknown function (ip: 0x7fd85966eb2b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/error.jl:109 [inlined]
initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986
#162 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:229
ensure_initialized at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50 [inlined]
with at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/types.jl:1165 [inlined]
getconfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:160 [inlined]
project at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:26
#generate#1 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:9
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
generate at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:3
jfptr_generate_62315.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
do_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:730
do_cmd! at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_57874.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:801 [inlined]
invokelatest at ./essentials.jl:798 [inlined]
run_interface at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/LineEdit.jl:2623
jfptr_run_interface_56136.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
run_frontend at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1289
#62 at ./task.jl:499
jfptr_YY.62_56211.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
start_task at /cache/build/default-amdci4-5/julialang/julia-master/src/task.c:931
Allocations: 2871 (Pool: 2859; Big: 12); GC: 0
Aborted (core dumped)

If there's a more useful stack trace I can produce let me know.

mkitti commented 2 years ago

Thanks. Can we isolate the error to the following line?

initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986

https://github.com/JuliaLang/julia/blob/63c69831a4eb0fccf06511dde430321e6e30a815/stdlib/LibGit2/src/LibGit2.jl#L986

In other words, is executing the following line sufficient to crash Julia?

ccall((:git_libgit2_init, :libgit2), Cint, ())
mkitti commented 2 years ago

The next step would be to verify that git_libgit2_init fails from a C program.

mkitti commented 2 years ago

The current binary build for LibGit2 included Julia is grabbed from https://github.com/JuliaBinaryWrappers/LibGit2_jll.jl as per the version indicated here https://github.com/JuliaLang/julia/blob/master/deps/libgit2.version

The build script here is here: https://github.com/JuliaPackaging/Yggdrasil/blob/master/L/LibGit2/build_tarballs.jl

mkitti commented 2 years ago

As far as I can tell, none of the bundled patches that Julia applies touches code in the stack trace.

https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LibGit2/bundled/patches

mkitti commented 2 years ago

Is LD_LIBRARY_PATH set to something?

reidsanders commented 2 years ago

Nothing with tpu-vm-tf-2.8.0 :

julia-51bb96857d/bin$ echo $LD_LIBRARY_PATH

With tpu-vm-pt-1.11:

julia-1.8.0-rc1/bin$ echo $LD_LIBRARY_PATH
:/usr/local/lib
julia-1.8.0-rc1/bin$ ll /usr/local/lib
total 805628
drwxr-xr-x  5 root root      4096 Mar 16 00:14 ./
drwxr-xr-x 13 root root      4096 Mar 16 00:14 ../
drwxr-xr-x  4 root root      4096 Mar 16 00:14 bazel/
-rw-r--r--  1 root root   2478686 Mar 16 00:13 libiomp5.so
-rw-r--r--  1 root root    624633 Mar 16 00:13 libiomp5_db.so
-rw-r--r--  1 root root    205094 Mar 16 00:13 libiompstubs5.so
-rwxr-xr-x  1 root root  53034392 Mar 16 00:13 libmkl_avx.so.2*
-rwxr-xr-x  1 root root  50137440 Mar 16 00:13 libmkl_avx2.so.2*
-rwxr-xr-x  1 root root  66658456 Mar 16 00:13 libmkl_avx512.so.2*
-rwxr-xr-x  1 root root    523704 Mar 16 00:13 libmkl_blacs_intelmpi_ilp64.so.2*
-rwxr-xr-x  1 root root    320552 Mar 16 00:13 libmkl_blacs_intelmpi_lp64.so.2*
-rwxr-xr-x  1 root root    532928 Mar 16 00:13 libmkl_blacs_openmpi_ilp64.so.2*
-rwxr-xr-x  1 root root    321552 Mar 16 00:13 libmkl_blacs_openmpi_lp64.so.2*
-rwxr-xr-x  1 root root    168912 Mar 16 00:13 libmkl_cdft_core.so.2*
lrwxrwxrwx  1 root root        31 Mar 16 00:13 libmkl_core.so.1 -> /usr/local/lib/libmkl_core.so.2*
-rwxr-xr-x  1 root root  73999168 Mar 16 00:13 libmkl_core.so.2*
-rwxr-xr-x  1 root root  42416560 Mar 16 00:13 libmkl_def.so.2*
-rwxr-xr-x  1 root root  13272328 Mar 16 00:13 libmkl_gf_ilp64.so.2*
-rwxr-xr-x  1 root root  17047584 Mar 16 00:13 libmkl_gf_lp64.so.2*
-rwxr-xr-x  1 root root  30979016 Mar 16 00:13 libmkl_gnu_thread.so.2*
-rwxr-xr-x  1 root root  13277104 Mar 16 00:13 libmkl_intel_ilp64.so.2*
lrwxrwxrwx  1 root root        37 Mar 16 00:13 libmkl_intel_lp64.so.1 -> /usr/local/lib/libmkl_intel_lp64.so.2*
-rwxr-xr-x  1 root root  17056672 Mar 16 00:13 libmkl_intel_lp64.so.2*
lrwxrwxrwx  1 root root        39 Mar 16 00:13 libmkl_intel_thread.so.1 -> /usr/local/lib/libmkl_intel_thread.so.2*
-rwxr-xr-x  1 root root  64858584 Mar 16 00:13 libmkl_intel_thread.so.2*
-rwxr-xr-x  1 root root  48742776 Mar 16 00:13 libmkl_mc.so.2*
-rwxr-xr-x  1 root root  50321512 Mar 16 00:13 libmkl_mc3.so.2*
-rwxr-xr-x  1 root root  38037904 Mar 16 00:13 libmkl_pgi_thread.so.2*
-rwxr-xr-x  1 root root   8695128 Mar 16 00:13 libmkl_rt.so.2*
-rwxr-xr-x  1 root root   7718648 Mar 16 00:13 libmkl_scalapack_ilp64.so.2*
-rwxr-xr-x  1 root root   7736496 Mar 16 00:13 libmkl_scalapack_lp64.so.2*
-rwxr-xr-x  1 root root  29005200 Mar 16 00:13 libmkl_sequential.so.2*
-rwxr-xr-x  1 root root  40617024 Mar 16 00:13 libmkl_tbb_thread.so.2*
-rwxr-xr-x  1 root root  15887648 Mar 16 00:13 libmkl_vml_avx.so.2*
-rwxr-xr-x  1 root root  15038968 Mar 16 00:13 libmkl_vml_avx2.so.2*
-rwxr-xr-x  1 root root  14364256 Mar 16 00:13 libmkl_vml_avx512.so.2*
-rwxr-xr-x  1 root root   7756240 Mar 16 00:13 libmkl_vml_cmpt.so.2*
-rwxr-xr-x  1 root root   8766704 Mar 16 00:13 libmkl_vml_def.so.2*
-rwxr-xr-x  1 root root  14775632 Mar 16 00:13 libmkl_vml_mc.so.2*
-rwxr-xr-x  1 root root  14619984 Mar 16 00:13 libmkl_vml_mc2.so.2*
-rwxr-xr-x  1 root root  14628344 Mar 16 00:13 libmkl_vml_mc3.so.2*
-rwxr-xr-x  1 root root     17904 Mar 16 00:13 libomp-fallback-cstring.o*
-rwxr-xr-x  1 root root      3900 Mar 16 00:13 libomp-fallback-cstring.spv*
-rwxr-xr-x  1 root root    358864 Mar 16 00:13 libomp-spirvdevicertl-optional.o*
-rwxr-xr-x  1 root root      9120 Mar 16 00:13 libomp-spirvdevicertl-required.o*
-rwxr-xr-x  1 root root    110880 Mar 16 00:13 libomptarget-opencl-optional.bc*
-rwxr-xr-x  1 root root      2420 Mar 16 00:13 libomptarget-opencl-required.bc*
-rwxr-xr-x  1 root root   8690912 Mar 16 00:13 libomptarget.rtl.level0.so*
-rwxr-xr-x  1 root root   8673576 Mar 16 00:13 libomptarget.rtl.opencl.so*
-rwxr-xr-x  1 root root   8328280 Mar 16 00:13 libomptarget.rtl.x86_64.so*
-rwxr-xr-x  1 root root    592776 Mar 16 00:13 libomptarget.so*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so.12*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so.12.5*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so.3*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so.3.5*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so.3*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so.3.5*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so.3*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so.3.5*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so.2*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so.2.5*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so.2*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so.2.5*
-rw-r--r--  1 root root    117438 Mar 16 00:13 mkl_msg.cat
drwxrwsr-x  4 root staff     4096 Mar 16 00:11 python2.7/
drwxrwsr-x  3 root staff     4096 Mar  8 22:16 python3.8/
mkitti commented 2 years ago

In other words, is executing the following line sufficient to crash Julia?

ccall((:git_libgit2_init, :libgit2), Cint, ())

Could you try this?

giordano commented 2 years ago

For the sake of making the package manager usable, you can set the environment variable JULIA_PKG_USE_CLI_GIT=true, which would avoid calling libgit2 in the first place. Of course this doesn't address the underlying issue, which would be great to understand and solve, but unfortunately when you get errors inside binary libraries there isn't much to do apart from firing up a debugger like gdb or lldb and walking your way inside them.

Side note, in case you wanted to try again to compile Julia with USE_BINARYBUILDER=0 (and related options), the process of building from source also the dependencies should have become more robust in the last few months, now this is even daily tested on CI for x86_64-linux-gnu.

reidsanders commented 2 years ago

With preview ccall does produce same stacktrace.

julia> ccall((:git_libgit2_init, :libgit2), Cint, ())
free(): invalid pointer

signal (6): Aborted
in expression starting at REPL[1]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f34e6e2329d)
unknown function (ip: 0x7f34e6e2b32b)
unknown function (ip: 0x7f34e6e2cb5b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
top-level scope at ./REPL[1]:1
jl_toplevel_eval_flex at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:903
jl_toplevel_eval_flex at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:152
repl_backend_loop at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:248
#start_repl_backend#46 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:233
start_repl_backend##kw at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:230 [inlined]
#run_repl#59 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:372
run_repl at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:357
jfptr_run_repl_56967.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
#990 at ./client.jl:413
jfptr_YY.990_45119.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
run_main_repl at ./client.jl:397
exec_options at ./client.jl:314
_start at ./client.jl:514
jfptr__start_26754.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
true_main at /cache/build/default-amdci4-5/julialang/julia-master/src/jlapi.c:567
jl_repl_entrypoint at /cache/build/default-amdci4-5/julialang/julia-master/src/jlapi.c:711
main at /cache/build/default-amdci4-5/julialang/julia-master/cli/loader_exe.c:59
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 2872 (Pool: 2858; Big: 14); GC: 0
Aborted (core dumped)

Unfortunately setting JULIA_PKG_USE_CLI_GIT=true does not seem to do anything.

(base) rs@t1v-n-d1477409-w-0:~/downloads/julia-51bb96857d/bin$ echo $JULIA_PKG_USE_CLI_GIT
true

(@v1.9) pkg> generate Demo2
  Generating  project Demo2:
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f2d3c6e129d)
unknown function (ip: 0x7f2d3c6e932b)
unknown function (ip: 0x7f2d3c6e957b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/error.jl:109 [inlined]
initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986
#162 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:229
ensure_initialized at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50 [inlined]
with at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/types.jl:1165 [inlined]
getconfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:160 [inlined]
project at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:26
#generate#1 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:9
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
generate at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:3
jfptr_generate_62315.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
do_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:730
do_cmd! at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_57874.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:801 [inlined]
invokelatest at ./essentials.jl:798 [inlined]
run_interface at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/LineEdit.jl:2623
jfptr_run_interface_56136.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
run_frontend at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1289
#62 at ./task.jl:499
jfptr_YY.62_56211.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
start_task at /cache/build/default-amdci4-5/julialang/julia-master/src/task.c:931
Allocations: 2868 (Pool: 2857; Big: 11); GC: 0
Aborted (core dumped)
giordano commented 2 years ago

Unfortunately setting JULIA_PKG_USE_CLI_GIT=true does not seem to do anything.

Did you export the variable? What's the value of ENV["JULIA_PKG_USE_CLI_GIT"] inside Julia?

reidsanders commented 2 years ago

Yes, I exported the variable. I also tried both true and 1.

julia> ENV["JULIA_PKG_USE_CLI_GIT"]
"1"

julia> ENV["JULIA_PKG_USE_CLI_GIT"]
"true"
KristofferC commented 2 years ago

JULIA_PKG_USE_CLI_GIT is only for setting how things are downloaded but during e.g. the Julia precompile stage, libgit2 is used for some other things (like creating a git repo).

mkitti commented 2 years ago

What does the trace look like for the conda-forge build?

That build uses libgit2 built against openssl rather than mbedtls so the stack trace should not use git_mbedtls_stream_global_init.

https://github.com/conda-forge/libgit2-feedstock/blob/214137e20348d3132361c7260952b465c9373e71/recipe/meta.yaml#L28

Otherwise, I would try to build libgit2 directly and see if their tests are passing.

Maybe @giordano knows how to set this up with our bundled patches applied? https://github.com/libgit2/libgit2/blob/main/CMakeLists.txt

inkydragon commented 2 years ago

@mkitti If you just want to build libgit2 with julia bundled patches. Here are the steps to build:

git clone https://github.com/libgit2/libgit2.git

# get patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-agent-nonfatal.patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-hostkey.patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-win32-ownership.patch

# checkout version
cd libgit2/
git checkout v1.4.3

# apply patch
patch -p1 -f < ../libgit2-agent-nonfatal.patch
patch -p1 -f < ../libgit2-hostkey.patch
patch -p1 -f < ../libgit2-win32-ownership.patch

# build flags
LIBGIT2_BUILD_FLAGS="-DCMAKE_BUILD_TYPE=Release -DUSE_THREADS=ON -DUSE_BUNDLED_ZLIB=ON -DUSE_SHA1=CollisionDetection"
# open tests and examples
LIBGIT2_BUILD_FLAGS="$LIBGIT2_BUILD_FLAGS -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON"
# use OpenSSL
LIBGIT2_BUILD_FLAGS="$LIBGIT2_BUILD_FLAGS -DUSE_HTTPS=\"OpenSSL\""

# build
mkdir build && cd build
cmake .. "$LIBGIT2_BUILD_FLAGS"
make -j `nproc`

julia enabled SSH support when building libgit2, using libssh2 with some patches. I assume that whether SSH is enabled (-DUSE_SSH=ON) or not is irrelevant to this issue.

giordano commented 2 months ago

Mentioning this also here: it's known that various Google environments (Google Cloud Platform, Google Colab, etc...) may set LD_PRELOAD to preload tcmalloc, which breaks lots of software, not just julia. When using GCP/Colab and other Google-provided virtual machines, makes sure LD_PRELOAD and LD_LIBRARY_PATH are not forcing to use external libraries which can cause problems.