JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.86k stars 5.49k forks source link

New precompilation crashes on Julia 1.11-rc1 #55147

Closed MilesCranmer closed 3 months ago

MilesCranmer commented 4 months ago

I'm seeing some precompilation crashes on Julia 1.11-rc1 when precompiling DynamicExpressions.jl with DispatchDoctor in-use on the package. (DispatchDoctor.jl is basically a package that calls promote_op on each function and uses that to flag type instabilities.)

Here is the traceback:

ERROR: The following 1 direct dependency failed to precompile:

DynamicExpressions 

Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/Users/mcranmer/.julia/compiled/v1.11/DynamicExpressions/jl_cQE0v5".
[39250] signal 4: Illegal instruction: 4
in expression starting at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/DynamicExpressions.jl:120
_eval_tree_array at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/Evaluate.jl:92 [inlined]
#eval_tree_array#2 at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:306
eval_tree_array at /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/stabilization.jl:301
#test_all_combinations#1 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:7
test_all_combinations at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:22 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:168 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:153 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:138
do_precompilation at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:161
unknown function (ip: 0x11570c053)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
do_call at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_stmt_value at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:174
eval_body at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_eval_module_expr at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:743
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:952 [inlined]
ijl_toplevel_eval_in at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
_include at ./loading.jl:2603
include at ./Base.jl:558 [inlined]
include_package_for_output at ./loading.jl:2721
jfptr_include_package_for_output_69600.1 at /Users/mcranmer/.julia/juliaup/julia-1.11.0-rc1+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
do_call at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_stmt_value at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:174
eval_body at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:952 [inlined]
ijl_toplevel_eval_in at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:429 [inlined]
include_string at ./loading.jl:2543
include_string at ./loading.jl:2553 [inlined]
exec_options at ./client.jl:316
_start at ./client.jl:526
jfptr__start_71098.1 at /Users/mcranmer/.julia/juliaup/julia-1.11.0-rc1+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/./julia.h:2156 [inlined]
true_main at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-11/src/jlapi.c:1059
Allocations: 84690299 (Pool: 84689426; Big: 873); GC: 4

versioninfo:

julia> versioninfo()
Julia Version 1.11.0-rc1
Commit 3a35aec36d1 (2024-06-25 10:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 6 default, 0 interactive, 3 GC (on 6 virtual cores)
Environment:
  JULIA_FORMATTER_SO = /Users/mcranmer/julia_formatter.so
  JULIA_NUM_THREADS = auto
  JULIA_OPTIMIZE = 3
  JULIA_EDITOR = code

I installed Julia with juliaup. To reproduce this issue, you can run the following code:

cd $(mktemp -d)
# Install package
julia +1.11 --startup-file=no --project=. -e 'using Pkg; pkg"add Preferences DynamicExpressions DispatchDoctor@v0.4.10"'
# Enable DispatchDoctor.jl
julia +1.11 --startup-file=no --project=. -e 'using Preferences; set_preferences!("DynamicExpressions", "instability_check" => "warn")'
# Precompile:
julia +1.11 --startup-file=no --project=. -e 'using Pkg; pkg"precompile"'

I can prevent this error with the following PR on DispatchDoctor.jl: https://github.com/MilesCranmer/DispatchDoctor.jl/compare/094b1651eeef3fb2017be46a48f0da13724e1123~...b223a4d033dd3d17901879871c508ae33cfd550a. The PR basically amounts to changing some functions into @generated form:

- map_specializing_typeof(args...) = map(specializing_typeof, args)
+ map_specializing_typeof(args::Tuple) = map(specializing_typeof, args)

- _promote_op(f, S::Type...) = Base.promote_op(f, S...)
- _promote_op(f, S::Tuple) = _promote_op(f, S...)
+ function _promote_op(f, S::Vararg{Type})
+     if @generated
+         :(Base.promote_op(f, S...))
+     else
+         Base.promote_op(f, S...)
+     end
+ end

However, it doesn't seem like DispatchDoctor.jl or DynamicExpressions.jl is doing anything wrong, so I'm not sure what's going on. Both before and after seem to be valid Julia code. Also, the downside of that PR is it introduces a type instability in Zygote autodiff, and there doesn't seem to be a way around it that both prevents the segfault while also eliminating the type instability.

I don't understand the conditions for reproducing this, so this is so far my only example. When I make various tweaks to _promote_op within DispatchDoctor.jl, I seem to end up with different segfaults – one of which is the Unreachable reached bug.

cc @avik-pal

giordano commented 4 months ago

I don't see any segmentation fault in the error you shared.

MilesCranmer commented 4 months ago

Wasn't sure what the "Illegal instruction" is. Updated description to just be "Error".

giordano commented 4 months ago

It means that the processor was asked to execute instructions it doesn't support ("illegal"). Think, for example, of trying to execute avx512 instructions on on a avx/avx2 processor (maybe because you compiled the program on a different machine, with a larger instructions set than the current one): it'd have no clue of what you're talking about.

MilesCranmer commented 4 months ago

I see. So, guess it's a bug then?

I see it on macOS M1 and then also the GitHub actions with ubuntu-latest: https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:640. This one gets the Unreachable reached at 0x7fd831cd2d85 issue:

``` ERROR: The following 2 direct dependencies failed to precompile: DynamicExpressions --code-coverage=@/home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/home/runner/.julia/compiled/v1.11/DynamicExpressions/jl_0E9EhT". Unreachable reached at 0x7fd831cd2d85 [4169] signal 4 (2): Illegal instruction in expression starting at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/DynamicExpressions.jl:111 is_bad_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301 ##eval_tree_array_simulator#549#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:87 [inlined] ##eval_tree_array_simulator#549 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:66 [inlined] #eval_tree_array#2 at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306 [inlined] eval_tree_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301 unknown function (ip: 0x7fd831d0322d) #test_all_combinations#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:7 test_all_combinations at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:22 [inlined] macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:169 [inlined] macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined] macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:154 [inlined] macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined] #do_precompilation#2 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:139 do_precompilation at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:162 unknown function (ip: 0x7fd831ccfea2) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126 eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223 eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined] eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663 jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943 jl_eval_module_expr at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined] jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:743 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886 ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994 eval at ./boot.jl:429 [inlined] include_string at ./loading.jl:2543 _include at ./loading.jl:2603 include at ./Base.jl:558 [inlined] include_package_for_output at ./loading.jl:2721 jfptr_include_package_for_output_69232.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126 eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223 eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined] eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663 jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886 ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994 eval at ./boot.jl:429 [inlined] include_string at ./loading.jl:2543 include_string at ./loading.jl:2553 [inlined] exec_options at ./client.jl:316 _start at ./client.jl:526 jfptr__start_70709.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] true_main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:900 jl_repl_entrypoint at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:1059 main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/cli/loader_exe.c:58 unknown function (ip: 0x7fd851629d8f) __libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: 0x4010b8) Allocations: 84382407 (Pool: 84381692; Big: 715); GC: 32 SymbolicRegression --code-coverage=@/home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none Failed to precompile SymbolicRegression [8254be44-1295-4e6a-a16d-4[660](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:661)3ac705cb] to "/home/runner/.julia/compiled/v1.11/SymbolicRegression/jl_G4yS0y". Unreachable reached at 0x7f2162ad2df5 [5165] signal 4 (2): Illegal instruction in expression starting at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/DynamicExpressions.jl:111 is_bad_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301 ##eval_tree_array_simulator#549#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:87 [inlined] ##eval_tree_array_simulator#549 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/Evaluate.jl:66 [inlined] #eval_tree_array#2 at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306 [inlined] eval_tree_array at /home/runner/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301 unknown function (ip: 0x7f2162b0322d) #test_all_combinations#1 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:7 test_all_combinations at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:22 [inlined] macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:169 [inlined] macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined] macro expansion at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:154 [inlined] macro expansion at /home/runner/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined] #do_precompilation#2 at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:139 do_precompilation at /home/runner/.julia/packages/DynamicExpressions/IF10i/src/precompile.jl:162 unknown function (ip: 0x7f2162acfea2) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126 eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223 eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined] eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:[663](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:664) jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943 jl_eval_module_expr at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined] jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:743 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886 ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994 eval at ./boot.jl:429 [inlined] include_string at ./loading.jl:2543 _include at ./loading.jl:2603 include at ./Base.jl:558 [inlined] include_package_for_output at ./loading.jl:2721 jfptr_include_package_for_output_69269.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] do_call at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:126 eval_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:223 eval_stmt_value at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined] eval_body at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:663 jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/interpreter.c:821 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:943 jl_toplevel_eval_flex at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:886 ijl_toplevel_eval_in at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/toplevel.c:994 eval at ./boot.jl:429 [inlined] include_string at ./loading.jl:2543 include_string at ./loading.jl:2553 [inlined] exec_options at ./client.jl:316 _start at ./client.jl:526 jfptr__start_70709.1 at /opt/hostedtoolcache/julia/1.11.0-rc1/x64/lib/julia/sys.so (unknown line) jl_apply at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined] true_main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:900 jl_repl_entrypoint at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/src/jlapi.c:1059 main at /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-11/cli/loader_exe.c:58 unknown function (ip: 0x7f2182429d8f) __libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: 0x4010b8) Allocations: 82059736 (Pool: 82059067; Big: [669](https://github.com/MilesCranmer/SymbolicRegression.jl/actions/runs/9947224797/job/27479456770?pr=326#step:6:670)); GC: 33 ERROR: LoadError: Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/home/runner/.julia/compiled/v1.11/DynamicExpressions/jl_YBf5sc". Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool; flags::Cmd, cacheflags::Base.CacheFlags, reasons::Dict{String, Int64}) @ Base ./loading.jl:3002 [3] (::Base.var"#1080#1081"{Base.PkgId})() @ Base ./loading.jl:2388 [4] mkpidlock(f::Base.var"#1080#1081"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool}) @ FileWatching.Pidfile /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:95 [5] #mkpidlock#6 @ /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:90 [inlined] [6] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64}) @ FileWatching.Pidfile /opt/hostedtoolcache/julia/1.11.0-rc1/x64/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:116 [7] #invokelatest#2 @ ./essentials.jl:1045 [inlined] [8] invokelatest @ ./essentials.jl:1040 [inlined] [9] maybe_cachefile_lock(f::Base.var"#1080#1081"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64) @ Base ./loading.jl:3525 [10] maybe_cachefile_lock @ ./loading.jl:3522 [inlined] [11] _require(pkg::Base.PkgId, env::String) @ Base ./loading.jl:2384 [12] __require_prelocked(uuidkey::Base.PkgId, env::String) @ Base ./loading.jl:2216 [13] #invoke_in_world#3 @ ./essentials.jl:1077 [inlined] [14] invoke_in_world @ ./essentials.jl:1074 [inlined] [15] _require_prelocked(uuidkey::Base.PkgId, env::String) @ Base ./loading.jl:2207 [16] macro expansion @ ./loading.jl:2146 [inlined] [17] macro expansion @ ./lock.jl:273 [inlined] [18] __require(into::Module, mod::Symbol) @ Base ./loading.jl:2103 [19] #invoke_in_world#3 @ ./essentials.jl:1077 [inlined] [20] invoke_in_world @ ./essentials.jl:1074 [inlined] [21] require(into::Module, mod::Symbol) @ Base ./loading.jl:2096 [22] include @ ./Base.jl:558 [inlined] [23] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing) @ Base ./loading.jl:2721 [24] top-level scope @ stdin:4 in expression starting at /home/runner/work/SymbolicRegression.jl/SymbolicRegression.jl/src/SymbolicRegression.jl:1 in expression starting at stdin:4 ```
giordano commented 4 months ago

Yes, it's definitely a bug, probably the compiler is emitting wrong instructions for the current ISA. There are a bunch of similar tickets: #53847, #53843, #53848, #53761, ...

MilesCranmer commented 4 months ago

Here's my git bisect log:

git bisect start
# status: waiting for both good and bad commits
# bad: [3a35aec36d13c3e651c97bac664da2e778d591ad] set VERSION to 1.11.0-rc1 (#54924)
git bisect bad 3a35aec36d13c3e651c97bac664da2e778d591ad
# status: waiting for good commit(s), bad commit known
# good: [48d4fd48430af58502699fdf3504b90589df3852] set VERSION to 1.10.4 (#54625)
git bisect good 48d4fd48430af58502699fdf3504b90589df3852
# good: [0ba6ec2d2282937a084d7e5e5a0b026dc953bb31] Restore link to list of packages in Base docs (#50353)
git bisect good 0ba6ec2d2282937a084d7e5e5a0b026dc953bb31
# skip: [e754f2036cbfc37ea24a33d02e86e41a9cf56af9] Add missing type annotation reported by JET (#52207)
git bisect skip e754f2036cbfc37ea24a33d02e86e41a9cf56af9
# skip: [959b474d0516df77a268d9f23ccda5d2ad32acdf] docs: update latest stable version (#52215)
git bisect skip 959b474d0516df77a268d9f23ccda5d2ad32acdf
# bad: [c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9] Fix variable name in scaling an `AbstractTriangular` with zero alpha (#52855)
git bisect bad c5d7b87a35b5beaef9d4d3aa53c0a2686f3445b9
# bad: [4115c725d25c19a86ce8d3e3a584f02d59a9a9ce] Create rand function for Base.KeySet and Base.ValueIterator{Dict} (#51608)
git bisect bad 4115c725d25c19a86ce8d3e3a584f02d59a9a9ce
# good: [8be469e275a455ca894fdc5fad8a80aafb359544] Separate foreign threads into a :foreign threadpool (#50912)
git bisect good 8be469e275a455ca894fdc5fad8a80aafb359544
# skip: [7d51502d7845246d6a231fdc4cf19451f42427e1] More missing constants from earlier libgit2 versions
git bisect skip 7d51502d7845246d6a231fdc4cf19451f42427e1
# skip: [4c3aaa2b34996708367f9d5e4472fb5a1062bf63] reflection: define `Base.generating_output` utility function (#51216)
git bisect skip 4c3aaa2b34996708367f9d5e4472fb5a1062bf63
# bad: [ca862df7bfc534d22d4d39d265d1f74d59c1ab77] fix `_tryonce_download_from_cache` (busybox.exe download error) (#51531)
git bisect bad ca862df7bfc534d22d4d39d265d1f74d59c1ab77
# skip: [5d82d8095042935be0eb044259098e0d7c695922] add tfuncs for `[and|or]_int` intrinsics (#51266)
git bisect skip 5d82d8095042935be0eb044259098e0d7c695922
# skip: [4e1c965b512967aaa20b77f37e1fe76548b1def7] Remove size(::StructuredMatrix, d) specializations (#51083)
git bisect skip 4e1c965b512967aaa20b77f37e1fe76548b1def7
# skip: [3fc4f6bb243cb623636f276cb143cf5c476bbc59] 🤖 [master] Bump the Downloads stdlib from f97c72f to 8a614d5 (#51246)
git bisect skip 3fc4f6bb243cb623636f276cb143cf5c476bbc59
# skip: [476572f749a035047d4d8e6e76ec5b701b85904e] makefile option to generate better code (#51105)
git bisect skip 476572f749a035047d4d8e6e76ec5b701b85904e
# skip: [8b3ffd8918e53d5241ad948e8500335848d3b602] cross-reference pathof and pkgdir in docstrings (#51298)
git bisect skip 8b3ffd8918e53d5241ad948e8500335848d3b602
# bad: [15f34aa649dbbb34e53ff6d16db15cd11ae4a887] [NFC] rng_split: some elaboration and clarification (#50680)
git bisect bad 15f34aa649dbbb34e53ff6d16db15cd11ae4a887
# skip: [f3d50b7de66b351dfdaa826fa529fefb75a829e1] Fix extended help hint to give full line to enter (#51193)
git bisect skip f3d50b7de66b351dfdaa826fa529fefb75a829e1
# skip: [dcb4060b58797cf64517a694fcab3ea16278cb87] docs: manual: point to `MutableArithmetics` in the Performance tips (#50987)
git bisect skip dcb4060b58797cf64517a694fcab3ea16278cb87
# skip: [70000ac7c3d5d5f21e42555cdf99e699a246f8ec] sysimg: Allow loading a system image that is already present in memory (#51121)
git bisect skip 70000ac7c3d5d5f21e42555cdf99e699a246f8ec
# skip: [7cadc6d70c0a3d2b2c20e50d4b3555475756f785] 🤖 [master] Bump the SHA stdlib from 2d1f84e to aaf2df6 (#51049)
git bisect skip 7cadc6d70c0a3d2b2c20e50d4b3555475756f785
# skip: [39a53168a824a9a223adc6642da31e7a26b6890a] optimize: fix `effect_free` refinement in post-opt dataflow analysis (#51185)
git bisect skip 39a53168a824a9a223adc6642da31e7a26b6890a
# skip: [dd0ce50f389981839d96969b279c3a11e0b4088e] Fix typo in command-line-interface.md (#51055)
git bisect skip dd0ce50f389981839d96969b279c3a11e0b4088e
# skip: [fbf73f44c000ee79d12b7bf1645f076b640fd10c] 🤖 [master] Bump the Pkg stdlib from 047734e4c to f570abd39 (#51186)
git bisect skip fbf73f44c000ee79d12b7bf1645f076b640fd10c
# skip: [b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8] Deprecate `permute!!` and `invpermute!!` (#51337)
git bisect skip b2dfa1db9e4d7b1cd499ba58943df17bc77fe1d8
# skip: [eab8d6b96b05f7e84103f66a902e4ee7ad395b48] Fix getfield codegen for tuple inputs and unknown symbol fields. (#51234)
git bisect skip eab8d6b96b05f7e84103f66a902e4ee7ad395b48
# skip: [a355403080167056d2af4ccee8eadfffd8fce97f] Annotate fieldnames for default cgparams [NFC]
git bisect skip a355403080167056d2af4ccee8eadfffd8fce97f
# skip: [354c36742eb1c2c4c5bfe454d6d4fe975565de96] Allow SparseArrays to catch `lu(::WrappedSparseMatrix)` (#51161)
git bisect skip 354c36742eb1c2c4c5bfe454d6d4fe975565de96
# bad: [e85f0a5a718f68e581b07eb60fd0d8203b0cd0da] complete false & true more generally as vals (#51326)
git bisect bad e85f0a5a718f68e581b07eb60fd0d8203b0cd0da
# skip: [91b8c9b99f05b99db8b259257adeb1997f8c4415] Add `JL_DLLIMPORT` to `small_typeof` declaration (#50892)
git bisect skip 91b8c9b99f05b99db8b259257adeb1997f8c4415
# skip: [27fa5de3f0e245cfff8c5cd1c850353742362cbf] Introduce cholesky and qr hooks for wrapped sparse matrices (#51220)
git bisect skip 27fa5de3f0e245cfff8c5cd1c850353742362cbf
# good: [74ce6cf070a2a04e836c3e5a2211228a3ac978ef] minor NFC in GC codebase (#50991)
git bisect good 74ce6cf070a2a04e836c3e5a2211228a3ac978ef
# skip: [3527213ccb1bfe0c48feab5da64d30cadbd4c526] simplify call to promote_eltype with repeated elements (#51135)
git bisect skip 3527213ccb1bfe0c48feab5da64d30cadbd4c526
# bad: [d51ad06f664b3439b4aee51b5cd5edd6b9d53c69] Avoid infinite loop when doing SIGTRAP in arm64-apple (#51284)
git bisect bad d51ad06f664b3439b4aee51b5cd5edd6b9d53c69

(Most of the skips are due to hanging precompilation.)

MilesCranmer commented 4 months ago

Only 100 revisions left in the git bisect log but I unfortunately need to run. The bisect log is above if someone wants to start where I left off. (My computer seems pretty slow at compilation unfortunately)

gbaraldi commented 4 months ago

Executing unreachable code triggers a trap which on most architectures just becomes a SIGILL so it might be this

maleadt commented 4 months ago

It's also a good idea to test an assertions build first.

MilesCranmer commented 4 months ago

Just tested assertions build; no extra info.

Is there a way I can see what Julia CI builds are successful for each commit? It seems like I have to skip a lot of commits which is making this bisecting take much longer than expected

MilesCranmer commented 4 months ago

I see a commit

Avoid infinite loop when doing SIGTRAP in arm64-apple

It seems like all of the commits before that I hit infinite precompilation... So not sure I will be able to bisect this further.

vchuravy commented 4 months ago

It's really obnoxious, but you can cherry pick that commit onto previous commits in your script

MilesCranmer commented 4 months ago

I just realised the SIGTRAP was what you said the error was likely coming from. So I'll try to do another bisection, treating the infinite precompilation == bad. And if that doesn't work, I'll do the cherry pick stuff.

MilesCranmer commented 4 months ago

Ok, found it! #51000 is the cause.

MilesCranmer commented 4 months ago

More clues:

MilesCranmer commented 3 months ago

Can we add this to the 1.11 milestone?

gbaraldi commented 3 months ago

I redid the bisect because the PR you mentioned looked harmless and I think wouldn't cause an unreachable reached error but a GC errror. Mine ended in https://github.com/JuliaLang/julia/commit/231ca24a62522a39cdfb880acb8391191bfe53db

MilesCranmer commented 3 months ago

Weird. Maybe one of the commits was accidentally marked good/bad(?), since it is hard to know if the precompilation is truly hanging or not.

Does https://github.com/JuliaLang/julia/commit/231ca24a62522a39cdfb880acb8391191bfe53db make sense as the cause to you?

gbaraldi commented 3 months ago

Yep. I just tried on top of release-1.11 and reverting it does make it not crash

MilesCranmer commented 3 months ago

Cool. So I guess it's the addition of @_terminates_locally_meta is somehow an incorrect compiler assumption?

aviatesk commented 3 months ago

Isn't this a bug in DispatchDoctor? I investigated quite deeply, and there doesn't seem to be a bug in Julia base or the compiler side. @_terminates_locally_meta certainly allows concrete evaluation for DispatchDoctor._Utils._promote_op, but that function is legally eligible for concrete evaluation. Looking at the implementation of DispatchDoctor, it seems to use reflection (especially Core.Compiler._return_type through Base.promote_op) within a generator even if the target function is @generated. This breaks the @generated assumption and could potentially cause undefined behavior, e.g. if a function definition is added later.

MilesCranmer commented 3 months ago

even if the target function is generated

DispatchDoctor doesn’t operate on generated functions, see https://github.com/MilesCranmer/DispatchDoctor.jl?tab=readme-ov-file#-special-cases

aviatesk commented 3 months ago

Are we sure that not all generated functions are using _promote_op? Looking at the DispatchDoctor implementation, I have confirmed that DispatchDoctor does not transform functions that directly use @generated, but IIUC it still allows @generated function to use other @stable functions transformed by DD within the generator, which effectively uses Core.Compiler._return_type within the generator.

MilesCranmer commented 3 months ago

Wouldn’t that be true of any instance of a @generated function that calls some other function which contains a list comprehension? Note that Base.promote_op is what’s used for figuring out the right element type in such cases

aviatesk commented 3 months ago

That's true, but it's somewhat unavoidable. That's why the type of the object returned by a list comprehension is not defined, and code that depends on the return type of value returned by list comprehension is not recommended. DD is an extreme case, as it changes the control flow of the code based on the results of such unstable type inferences, so it's not surprising that segmentation faults or other undefined behaviors occur.

MilesCranmer commented 3 months ago

and code that depends on the return type of value returned by list comprehension is not recommended

I am very surprised by your statement here, are you certain about this? List comprehensions are very common. So does this mean we are not supposed to use dispatch on the result of any code that contains a list comprehension? In other words:

f(i) = [i for _ in 1:5]
g(::Vector) = 1
g(::Vector{Int}) = 2
h(i) = g(f(i))
h(1)

This code similarly:

“changes the control flow of the code based on the results of such unstable type inferences”

So are you saying if I have this style of code in my library, I should expect to have segfaults and undefined behavior?

It’s one thing for inference to fail to get the right type, but crashes are a different story, and we should definitely try to prevent that.

KristofferC commented 3 months ago

(Removing from milestone based on the discussion above)

MilesCranmer commented 3 months ago

I don’t think it’s settled. Maybe we can revert https://github.com/JuliaLang/julia/pull/51002 for 1.11, since reverting it was demonstrated to fix a Julia crash, and then add it back to master while we figure this out?

KristofferC commented 3 months ago

I don't think it warrants reverting if the issue comes from not following the restrictions outlined in https://docs.julialang.org/en/v1/manual/metaprogramming/#Generated-functions

Generated functions must not mutate or observe any non-constant global state (including, for example, IO, locks, non-local dictionaries, or using hasmethod). This means they can only read global constants, and cannot have any side effects. In other words, they must be completely pure. Due to an implementation limitation, this also means that they currently cannot define a closure or generator.

MilesCranmer commented 3 months ago

The issue is not from a generated function; see comment above. https://github.com/JuliaLang/julia/issues/55147#issuecomment-2259944230

MilesCranmer commented 3 months ago

If the issue is on the package’s side, Julia should safely catch it and print an error, rather than crashing. We should at least clearly identify the issue so it can be documented.

And if the issue is on the Julia side, it should be patched.

In either case I don’t think it’s ready for a 1.11 release if we don’t understand what’s going on.

aviatesk commented 3 months ago

The issue is not from a generated function; see comment above. #55147 (comment)

It's from a generated function which may call the type inference reflection from the generator.

f(i) = [i for _ in 1:5]
g(::Vector) = 1
g(::Vector{Int}) = 2
h(i) = g(f(i))
h(1)

So are you saying if I have this style of code in my library, I should expect to have segfaults and undefined behavior?

Using such code is fine (especially in normal contexts), but calling it within a generator is technically invalid. If the code generated by a generator that internally relies on such code is broken, even if it causes a segfault, it's unavoidable.

MilesCranmer commented 3 months ago

Do you mean that

f(i) = [i for _ in 1:5]
g(::Vector) = 1
g(::Vector{Int}) = 2
@generated h(i) = :(g(f(i)))
h(1)

might cause a segfault, and this is unavoidable?

Could you otherwise give an example of invalid code? I don't think DD is doing anything illegal which is why I'm surprised by the Illegal instruction: 4 error. Furthermore, the patch that temporarily got around that issues seems to be unrelated.

Other than that, are we absolutely sure that :terminates_locally is a correct effect for [in|all|any](x, ::Tuple), 100% of the time? Are there no potential compiler optimizations which could void this in some way? There look to be only a few uses of this in Base: https://github.com/search?q=repo:JuliaLang/julia%20/_terminates_locally_meta/&type=code so maybe there are some sharp edges to the use of this effect which are not well explored yet? These are very very broadly used methods after all.

aviatesk commented 3 months ago

might cause a segfault, and this is unavoidable?

What I'm saying is same as what stated in the documentation, calling it from a generator may cause undefined behavior, but your example is fine since it uses list comprehension in a generated code. Having said that, list comprehension might not have been a good example because the type of the value returned by a list comprehension is not defined by type inference (for most cases, IIUC?). So as an alternative example, consider the following example as invalid:

g(f, xs...) = isconcrete(Base.infer_return_type(f, xs))
@generated function h(f, xs)
    if g(f, xs)
        some_ex = ...
    else
        some_ex = ...
    end
    return :( #=code using `some_ex`=# )
end

Other than that, are we absolutely sure that :terminates_locally is a correct effect for [in|all|any](x, ::Tuple), 100% of the time? Are there no potential compiler optimizations which could void this in some way? There look to be only a few uses of this in Base: https://github.com/search?q=repo:JuliaLang/julia%20/_terminates_locally_meta/&type=code so maybe there are some sharp edges to the use of this effect which are not well explored yet? These are very very broadly used methods after all.

I can't say that possibility doesn't exist, but I think the probability is lower than the probability that there is an issue on the DD side. I don't know everything about the implementation of DD, but it seems that it can break the assumptions of @generated functions. In this case, @_terminates_locally_meta has surfaced this possibility, but the likelihood of the issue being in the Julia compiler seems to be low (because similar issues haven't occurred in other packages).

MilesCranmer commented 3 months ago

The point I am confused about is that DD doesn't actually use @generated functions[^1], and neither does it interact with them. The code it generates seems no different from any code a user might write.

[^1]: It does now, only to work around the issue in this thread. https://github.com/MilesCranmer/DispatchDoctor.jl/commit/4ed36ca1842f0231dbf4f53f9cd0aeeb3724f8ef Is the commit that works around it for DD (which seems to result in type instability for Zygote)

aviatesk commented 3 months ago

Isn't this true?

I have confirmed that DispatchDoctor does not transform functions that directly use @generated, but IIUC it still allows @generated function to use other @stable functions transformed by DD within the generator

I.e. even if DD does not transform the generated function itself, the generator of that generated function might call a function that has been transformed by DD.

MilesCranmer commented 3 months ago

Right, but the example you gave in https://github.com/JuliaLang/julia/issues/55147#issuecomment-2260326648 will *never* occur in DD-generated code. I agree that example code is problematic. But it simply never happens here.

i.e., I don't think we can write off the Illegal instruction as coming from this theoretical code.

aviatesk commented 3 months ago

The issue is that generated functions that may call DD-transformed functions are not pure. The exact pattern of the example I gave above is one instance of such impure-ness.

MilesCranmer commented 3 months ago

Okay, this is the part that doesn't make sense to me though, because I could have the very same issue with any code that implicitly has a Base.promote_op, such as any list comprehension or map. Basically I'm not seeing the difference between DD and any other code that uses a list comprehension followed by a dispatch.

MilesCranmer commented 3 months ago

Okay, to test your theory, I just manually went through the codebase of the DispatchDoctor'd package, DynamicExpressions.jl on the master branch (specifically https://github.com/SymbolicML/DynamicExpressions.jl/commit/67bfab09033d3192d79defb79e1c313c2da03900). There are several @generated functions where the branch of generated code depends on the result of DispatchDoctor'd functions. I went through and disabled each of these.

This does not solve the problem, I get the same error as above.


Here is the git patch you can apply for yourself, following the reproduction guide in https://github.com/JuliaLang/julia/issues/55147#issue-2412066259

From 09179ab1d7cdda1664a23a55816ad412c719a6de Mon Sep 17 00:00:00 2001
From: MilesCranmer <miles.cranmer@gmail.com>
Date: Wed, 31 Jul 2024 18:21:49 +0100
Subject: [PATCH] hack: manually disable dispatch doctor in generated branches

---
 src/Evaluate.jl | 4 ++--
 src/Utils.jl    | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/Evaluate.jl b/src/Evaluate.jl
index c5cedb93..9d160a8e 100644
--- a/src/Evaluate.jl
+++ b/src/Evaluate.jl
@@ -183,8 +183,8 @@ function eval_tree_array(
     return eval_tree_array(tree, cX, operators; kws...)
 end

-get_nuna(::Type{<:OperatorEnum{B,U}}) where {B,U} = counttuple(U)
-get_nbin(::Type{<:OperatorEnum{B}}) where {B} = counttuple(B)
+@unstable get_nuna(::Type{<:OperatorEnum{B,U}}) where {B,U} = counttuple(U)
+@unstable get_nbin(::Type{<:OperatorEnum{B}}) where {B} = counttuple(B)

 function _eval_tree_array(
     tree::AbstractExpressionNode{T},
diff --git a/src/Utils.jl b/src/Utils.jl
index bd3326e2..343a662e 100644
--- a/src/Utils.jl
+++ b/src/Utils.jl
@@ -1,6 +1,7 @@
 """Useful functions to be used throughout the library."""
 module UtilsModule

+using DispatchDoctor: @unstable
 using MacroTools: postwalk, @capture, splitdef, combinedef

 # Returns two arrays
@@ -124,7 +125,7 @@ function deprecate_varmap(variable_names, varMap, func_name)
     return variable_names
 end

-counttuple(::Type{<:NTuple{N,Any}}) where {N} = N
+@unstable counttuple(::Type{<:NTuple{N,Any}}) where {N} = N

 """
     Undefined
-- 
2.39.0

Apply this to DynamicExpressions.jl. This essentially makes it so that DispatchDoctor.jl is never used in the code which selects the branch of each @generated function.

I then re-precompile the code. I run into the same issue as before:

Failed to precompile DynamicExpressions [a40a106e-89c9-4ca8-8020-a735e8728b6b] to "/Users/mcranmer/.julia/compiled/v1.11/DynamicExpressions/jl_mQJIc6".
[12551] signal 4: Illegal instruction: 4
in expression starting at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/DynamicExpressions.jl:131
_eval_tree_array at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/Evaluate.jl:160 [inlined]
#eval_tree_array#3 at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:306
eval_tree_array at /Users/mcranmer/.julia/packages/DispatchDoctor/eWFc7/src/stabilization.jl:301
#test_all_combinations#1 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:7
test_all_combinations at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:22 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:180 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:165 [inlined]
macro expansion at /Users/mcranmer/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
#do_precompilation#2 at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:150
do_precompilation at /Users/mcranmer/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/precompile.jl:173
aviatesk commented 3 months ago

A code generator using list comprehension is technically impure, but list comprehension does not change observable behavior based on the result of type inference. In other words, the impure ness of list comprehension does not significantly affect the generator's behavior, so it usually does not cause problems.

On the other hand, if my understanding is correct, DD transformation does change a generator behavior based on the results of type inference and sometimes performs other impure actions like printing.

MilesCranmer commented 3 months ago

Our messages sent at the same time, please see https://github.com/JuliaLang/julia/issues/55147#issuecomment-2261010153

MilesCranmer commented 3 months ago

I can't say that possibility doesn't exist, but I think the probability is lower than the probability that there is an issue on the DD side.

I think it's important to think about the expected cost of each option though. A bug in Julia is much worse than a bug in DD. Unless you can be 100% sure that :terminates_locally is the right effect for [in|all|any](x, ::Tuple), for all possible types going into Tuple, and all possible compiler optimizations that might happen, then it seems a bit risky.

It's just strange that DD would work fine for 1.6.7 - 1.10.4, across several tested packages, running with heavy CI across multiple operating systems, and this one single PR be enough to surface an Illegal instruction error? DD even works fine on 1.11.0 without that PR.

And the fact that the current workaround is to wrap the promote_op inside another @generated function... Also seems weird?

aviatesk commented 3 months ago

You also need this...

diff --git a/src/utils.jl b/src/utils.jl
index 197d0be..5ec22c0 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -120,7 +120,7 @@ return false for `Union{}`, so that errors can propagate.
 # so we implement a workaround.
 @inline type_instability(::Type{Type{T}}) where {T} = type_instability(T)

-@generated function type_instability_limit_unions(
+function type_instability_limit_unions(
     ::Type{T}, ::Val{union_limit}
 ) where {T,union_limit}
     if T isa UnionAll

With this it succeeds precompilation.

MilesCranmer commented 3 months ago

type_instability_limit_unions isn't used for DynamicExpressions. I presume you are checking out the master branch of DispatchDoctor? You need to use v0.4.10, before I implemented the workaround.

(I just tried your suggestion, and, while probably a good idea anyways, it doesn't fix this issue.)

aviatesk commented 3 months ago

I'm on DE master with your patch and DD on v0.4.10 with my patch.

MilesCranmer commented 3 months ago

Can you reproduce the compilation error normally? Illegal instruction error is system-dependent.

(I'm also DE master + patch; v0.4.10 with your patch – still see the bug. But I'm on ARM64 on my MacBook)

aviatesk commented 3 months ago

The precompilation error reproduces with either or both of the patches reverted.

MilesCranmer commented 3 months ago

Well this is getting even weirder then. I simply cannot reproduce what you are seeing. I've now triple checked my Manifest file, LocalPreferences, and even deleted by entire ~/.julia/compiled. Same bug. Only from the patch https://github.com/MilesCranmer/DispatchDoctor.jl/commit/4ed36ca1842f0231dbf4f53f9cd0aeeb3724f8ef does it go away.

What's your versioninfo()? Are you on 1.11-rc2 or 1.11-rc1 (or nightly)? I'm on 1.11.0-rc1 at the moment.


Edit: Same issue on 1.11.0-rc2 for me.

KristofferC commented 3 months ago

It's just strange that DD would work fine for 1.6.7 - 1.10.4, across several tested packages, running with heavy CI across multiple operating systems, and this one single PR be enough to surface an Illegal instruction error

It is not very strange, faulty code can be working for a long time until some innocent looking change happens to trigger the faulty behavior. As an example, using pointer without GC preserving the object usually worked OK until the GC got better and it started cleanung things up under your nose. The solution then is not to revert the improved GC.