JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.47k stars 5.46k forks source link

Segmentation fault in LLVM frames on Julia 1.9.3 when precompiling CSV.jl on Intel Sapphire Rapids CPU #51482

Closed BioTurboNick closed 3 months ago

BioTurboNick commented 12 months ago

I don't even know where to begin here. This is some deep magic.

All I did before this happened was migrating from Amazon Linux 2 to Amazon Linux 2023, and installed Julia 1.9.3 with Juliaup. Also a new Intel processor generation if that matters (Intel(R) Xeon(R) Platinum 8488C).

The only packages in the environment are Pluto v0.19.28 and CSV v0.10.11. Triggered when precompiling CSV.jl.

[50019] signal (11.1): Segmentation fault
in expression starting at /home/ec2-user/.julia/packages/CSV/OnldF/src/precompile.jl:3
_ZN4llvm19MachineRegisterInfo22addRegOperandToUseListEPNS_14MachineOperandE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm12MachineInstr10addOperandERNS_15MachineFunctionERKNS_14MachineOperandE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm7BuildMIERNS_17MachineBasicBlockERNS_12MachineInstrERKNS_8DebugLocERKNS_11MCInstrDescENS_8RegisterE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZNK4llvm17X86TargetLowering27EmitInstrWithCustomInserterERNS_12MachineInstrEPNS_17MachineBasicBlockE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_112FinalizeISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc14SimpleCompilerclERNS_6ModuleE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
operator() at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1206
_ZN4llvm3orc14IRCompileLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16IRTransformLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
emit at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:631
_ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc19MaterializationTask3runEv at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJSt10unique_ptrINS_3orc4TaskESt14default_deleteIS4_EEEE8CallImplIPFvS7_EEEvPvRS7_ at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISF_vEEEENSG_ISD_vEENS_6detail12DenseMapPairISD_SI_EEEEEE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_vEENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_vEENSN_IS6_SV_EEEEEE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EERKNS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISK_vEEEENSL_IS5_vEENS_6detail12DenseMapPairIS5_SN_EEEEEE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolStringPtrENS0_11SymbolStateE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS0_15SymbolStringPtrENS0_11SymbolStateE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS_9StringRefENS0_11SymbolStateE at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/bin/../lib/julia/libLLVM-14jl.so (unknown line)
addModule at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1420
jl_add_to_ee at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1815
jl_add_to_ee at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1794
_jl_compile_codeinst at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:241
jl_generate_fptr_impl at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:460
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2348 [inlined]
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2237
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2750 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
File at /home/ec2-user/.julia/packages/CSV/OnldF/src/file.jl:290
File at /home/ec2-user/.julia/packages/CSV/OnldF/src/file.jl:227 [inlined]
#File#32 at /home/ec2-user/.julia/packages/CSV/OnldF/src/file.jl:223 [inlined]
File at /home/ec2-user/.julia/packages/CSV/OnldF/src/file.jl:162
unknown function (ip: 0x7fc09eb508b8)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
macro expansion at /home/ec2-user/.julia/packages/CSV/OnldF/src/precompile.jl:10 [inlined]
macro expansion at /home/ec2-user/.julia/packages/PrecompileTools/kmH5L/src/workloads.jl:78 [inlined]
macro expansion at /home/ec2-user/.julia/packages/CSV/OnldF/src/precompile.jl:7 [inlined]
top-level scope at /home/ec2-user/.julia/packages/PrecompileTools/kmH5L/src/workloads.jl:140
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:903
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1963
include at ./Base.jl:457
jfptr_include_35036.clone_1 at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
jl_f__call_latest at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:774
include at /home/ec2-user/.julia/packages/CSV/OnldF/src/CSV.jl:24
unknown function (ip: 0x7fc09eafd482)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_eval_module_expr at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:715
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1963
include at ./Base.jl:457 [inlined]
include_package_for_output at ./loading.jl:2049
jfptr_include_package_for_output_38844.clone_1 at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
include_string at ./loading.jl:1913 [inlined]
exec_options at ./client.jl:305
_start at ./client.jl:522
jfptr__start_40034.clone_1 at /home/ec2-user/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
__libc_start_call_main at /lib64/libc.so.6 (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 14670069 (Pool: 14651321; Big: 18748); GC: 21
BioTurboNick commented 12 months ago

I suppose this is the line being compiled?

https://github.com/JuliaData/CSV.jl/blob/cb1b411f893e8e8cc120544ce629e07d25f2fb50/src/file.jl#L290-L290

BioTurboNick commented 12 months ago

Works: 1.8.0, 1.8.5, 1.10-beta2

Crash: 1.9.0, 1.9.3

vtjnash commented 12 months ago

Could you try an asserts-enabled build with --check-bounds=yes just for comparison? I would ask for an rr trace, but those can be hard to replay if your machine is newer. (unless we get lucky and it reproduces under Intel's PIN tool for that platform). You can find some builds on the buildkite page, such as:

https://buildkite.com/julialang/julia-release-1-dot-9/builds/304#018a2807-cbd0-4b74-8245-0af0a87ad6ca

BioTurboNick commented 12 months ago

With the assert build you linked to (thanks for that), with --check-bounds=yes

Unexpected instr type to insert
UNREACHABLE executed at /workspace/srcdir/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:35540!

[69278] signal (6.-6): Aborted
in expression starting at /home/ec2-user/.julia/packages/CSV/OnldF/src/precompile.jl:3
__pthread_kill_implementation at /lib64/libc.so.6 (unknown line)
raise at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
_ZN4llvm25llvm_unreachable_internalEPKcS1_j at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZNK4llvm17X86TargetLowering27EmitInstrWithCustomInserterERNS_12MachineInstrEPNS_17MachineBasicBlockE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_112FinalizeISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc14SimpleCompilerclERNS_6ModuleE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
operator() at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1206
_ZN4llvm3orc14IRCompileLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16IRTransformLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
emit at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:631
_ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc19MaterializationTask3runEv at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJSt10unique_ptrINS_3orc4TaskESt14default_deleteIS4_EEEE8CallImplIPFvS7_EEEvPvRS7_ at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession12dispatchTaskESt10unique_ptrINS0_4TaskESt14default_deleteIS3_EE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISF_vEEEENSG_ISD_vEENS_6detail12DenseMapPairISD_SI_EEEEEE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_vEENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_vEENSN_IS6_SV_EEEEEE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EERKNS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISK_vEEEENSL_IS5_vEENS_6detail12DenseMapPairIS5_SN_EEEEEE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolStringPtrENS0_11SymbolStateE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS0_15SymbolStringPtrENS0_11SymbolStateE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS_9StringRefENS0_11SymbolStateE at /home/ec2-user/julia-assert-1.9/julia-1.9.3/bin/../lib/julia/libLLVM-14jl.so (unknown line)
addModule at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1420
jl_add_to_ee at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1815
jl_add_to_ee at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1794
jl_add_to_ee at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1794
jl_add_to_ee at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:1794
_jl_compile_codeinst at /cache/build/default-amdci4-3/julialang/julia-release-1-dot-9/src/jitlayers.cpp:241

[...] everything in here is the same as in the OP

Allocations: 15055698 (Pool: 15037856; Big: 17842); GC: 20

For some reason the number of jl_add_to_ee frames seems to be variable

BioTurboNick commented 12 months ago

Tried rr and it is failing because it doesn't recognize the CPU:

[FATAL src/PerfCounters_x86.h:118:compute_cpu_microarch()] Intel CPU type 0x806f0 unknown

Looks like it's been added to rr master, but no released version recognizes it yet.

BioTurboNick commented 11 months ago

RR 5.7 just released, guess the Julia copy needs to be updated? https://github.com/JuliaLang/BugReporting.jl/issues/140

gbaraldi commented 11 months ago

Could you also try building julia 1.9 with assertions?

BioTurboNick commented 11 months ago

I used an assert build here: https://github.com/JuliaLang/julia/issues/51482#issuecomment-1738252801

giordano commented 11 months ago

Works: 1.8.0, 1.8.5, 1.10-beta2

Crash: 1.9.0, 1.9.3

I can confirm it crashes for me on 1.9.3, but it works on

Julia Version 1.11.0-DEV.414
Commit f06650091eb (2023-09-06 06:03 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 112 × Intel(R) Xeon(R) Platinum 8480+
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, sapphirerapids)
  Threads: 1 on 112 virtual cores

This sounds like some sort of bugs in LLVM 14, this blog post suggests LLVM 15 received lots of work from Intel engineers for this family of CPUs, which may have fixed this issue.

BioTurboNick commented 11 months ago

Is it very hard to build LLVM for Julia with a patch and try it out? Or is there a way to disable certain LLVM features as a workaround? I tried using JULIA_CPU_TARGET but was getting errors from Julia about incompatible targets. Good chance I'm not using it right though.

Based on the assertion build stack trace, a switch statement in X86TargetLowering::EmitInstrWithCustomInserter in X86ISelLowering.cpp is falling back to the default case, indicating an unrecognized instruction. LLVM15 adds two new cases here, CMOV_FR16 and CMOV_FR16X.

They were added in this commit: https://github.com/llvm/llvm-project/commit/655ba9c8a1d22075443711cc749f0b032e07adee#diff-eb2f176d67cdf1955a90e71e25d6d39910d723d4e0b8a9bf8dfa229d3a6b2c1e

The other place where these appear is X86FastISel::X86FastEmitPseudoSelect, which changed between LLVM 13, 14, and 15. The f16 case changed in this way: LLVM Instruction
13.x not present
14.x CMOV_FR16X
15.x Subtarget->hasAVX512() ? X86::CMOV_FR16X : X86::CMOV_FR16
BioTurboNick commented 11 months ago

I got LLVM to dump debug info, and immediately before the error is this function output, which happens to contain right at the end a CMOV_FR16X instruction.

Based on the line numbers, I think it might be Float16(x::BigInt, ::RoundingMode{:Nearest}) @ Base.GMP gmp.jl:447

# Machine code for function julia_Float16_1069: IsSSA, TracksLiveness
Frame Objects:
  fi#0: size=16, align=8, at location [SP+8]
  fi#1: size=32, align=32, at location [SP+8]
Constant Pool:
  cp#0: 0xH7C00, align=2
  cp#1: -0.000000e+00, align=4
Function Live Ins: $rdi in %10

bb.0.top:
  successors: %bb.2, %bb.1
  liveins: $rdi
  %10:gr64 = COPY $rdi
  %11:gr64 = COPY killed %10:gr64
  %16:vr256x = AVX512_256_SET0
  VMOVDQA64Z256mr %stack.1.gcframe, 1, $noreg, 0, $noreg, killed %16:vr256x :: (store (s256) into %ir.2)
  %0:gr64 = LEA64r %stack.1.gcframe, 1, $noreg, 16, $noreg
  INLINEASM &"movq %fs:0, $0" [attdialect], $0:[regdef:GR64], def %14:gr64
  %1:gr64 = MOV64rm %14:gr64, 1, $noreg, -8, $noreg :: (load (s64) from %ir.ppgcstack)
  MOV64mi32 %stack.1.gcframe, 1, $noreg, 0, $noreg, 8 :: (store (s64) into %ir.4, !tbaa !7)
  %17:gr64 = MOV64rm %1:gr64, 1, $noreg, 0, $noreg :: (load (s64) from %ir.pgcstack)
  MOV64mr %stack.1.gcframe, 1, $noreg, 8, $noreg, killed %17:gr64 :: (store (s64) into %ir.6, !tbaa !7)
  %18:gr64 = LEA64r %stack.1.gcframe, 1, $noreg, 0, $noreg
  MOV64mr %1:gr64, 1, $noreg, 0, $noreg, killed %18:gr64 :: (store (s64) into %ir.8)
  MOV64mi32 %stack.1.gcframe, 1, $noreg, 16, $noreg, 0 :: (store (s64) into %ir.x)
  MOV64mr %stack.1.gcframe, 1, $noreg, 16, $noreg, %11:gr64 :: (store (s64) into %ir.x)
  %19:gr64 = MOV64rm %stack.1.gcframe, 1, $noreg, 16, $noreg, debug-location !11 :: (load (s64) from %ir.x); pointer.jl:155 @[ refvalue.jl:42 @[ refpointer.jl:101 @[ gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ] ] ] ]
  %20:gr64 = MOV64ri @jlplt___gmpz_cmp_si_873_got, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  %21:gr64 = MOV64rm killed %20:gr64, 1, $noreg, 0, $noreg, debug-location !20 :: (dereferenceable load unordered (s64) from @jlplt___gmpz_cmp_si_873_got); gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  MOV64mr %stack.1.gcframe, 1, $noreg, 24, $noreg, %19:gr64 :: (store (s64) into %ir.22)
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  %22:gr32 = MOV32r0 implicit-def dead $eflags
  %23:gr64 = SUBREG_TO_REG 0, killed %22:gr32, %subreg.sub_32bit
  $rdi = COPY %19:gr64, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  $rsi = COPY %23:gr64, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  CALL64r killed %21:gr64, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit-def $rsp, implicit-def $ssp, implicit-def $eax, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  %24:gr32 = COPY $eax, debug-location !20; gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ]
  %25:gr64 = MOVSX64rr32 %24:gr32, debug-location !42; boot.jl:703 @[ boot.jl:784 @[ gmp.jl:239 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ] ] ]
  TEST64rr %25:gr64, %25:gr64, implicit-def $eflags, debug-location !47; int.jl:83 @[ number.jl:162 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ] ]
  %26:gr64 = MOV32ri64 1
  %27:gr64 = CMOV64rr %25:gr64(tied-def 0), killed %26:gr64, 15, implicit $eflags, debug-location !57; essentials.jl:575 @[ number.jl:162 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ] ]
  %28:gr64 = MOV64ri32 -1
  %29:gr64 = CMOV64rr %27:gr64(tied-def 0), killed %28:gr64, 8, implicit $eflags, debug-location !57; essentials.jl:575 @[ number.jl:162 @[ gmp.jl:601 @[ gmp.jl:697 @[ gmp.jl:448 ] ] ] ]
  TEST64rr %29:gr64, %29:gr64, implicit-def $eflags, debug-location !60; promotion.jl:499 @[ gmp.jl:697 @[ gmp.jl:448 ] ]
  %15:fr16x = AVX512_FsFLD0SH
  JCC_1 %bb.2, 5, implicit $eflags, debug-location !26; gmp.jl:448
  JMP_1 %bb.1, debug-location !26; gmp.jl:448

bb.1.common.ret:
; predecessors: %bb.0, %bb.8

  %2:fr16x = PHI %15:fr16x, %bb.0, %135:fr16x, %bb.8
  %153:gr64 = MOV64rm %stack.1.gcframe, 1, $noreg, 8, $noreg :: (load (s64) from %ir.60, !tbaa !7)
  MOV64mr %1:gr64, 1, $noreg, 0, $noreg, %153:gr64 :: (store (s64) into %ir.62, !tbaa !7)
  $xmm0 = COPY %2:fr16x, debug-location !63; gmp.jl:0
  RET64 implicit $xmm0, debug-location !63; gmp.jl:0

bb.2.L12:
; predecessors: %bb.0
  successors: %bb.3, %bb.7

  %43:gr64 = MOV64rm %0:gr64, 1, $noreg, 0, $noreg, debug-location !64 :: (load (s64) from %ir.x); Base.jl:37 @[ gmp.jl:449 ]
  %44:gr64 = MOV64rm %43:gr64, 1, $noreg, 8, $noreg, debug-location !64 :: (load (s64) from %ir.70, !tbaa !68, !alias.scope !72, !noalias !73, addrspace 11); Base.jl:37 @[ gmp.jl:449 ]
  %42:gr64 = MOV64rm %44:gr64, 1, $noreg, 0, $noreg, debug-location !74 :: (load (s64) from %ir.76, align 1, !tbaa !77, !alias.scope !72, !noalias !73); pointer.jl:111 @[ pointer.jl:111 @[ gmp.jl:449 ] ]
  %39:gr64 = LZCNT64rr %42:gr64, implicit-def dead $eflags, debug-location !78; int.jl:428 @[ gmp.jl:450 ]
  %38:gr64 = MOV32ri64 64, debug-location !83; int.jl:86 @[ gmp.jl:450 ]
  %40:gr64 = SUB64rr %38:gr64(tied-def 0), %39:gr64, implicit-def $eflags, debug-location !83; int.jl:86 @[ gmp.jl:450 ]
  %37:gr64 = MOV32ri64 16, debug-location !85; int.jl:83 @[ operators.jl:369 @[ gmp.jl:451 ] ]
  CMP64rr %37:gr64, %40:gr64, implicit-def $eflags, debug-location !85; int.jl:83 @[ operators.jl:369 @[ gmp.jl:451 ] ]
  %36:gr8 = SETCCr 12, implicit $eflags, debug-location !85; int.jl:83 @[ operators.jl:369 @[ gmp.jl:451 ] ]
  %35:gr8 = AND8ri %36:gr8(tied-def 0), 1, implicit-def $eflags, debug-location !87; gmp.jl:451
  %32:gr8 = XOR8ri %35:gr8(tied-def 0), -1, implicit-def $eflags, debug-location !87; gmp.jl:451
  TEST8ri %32:gr8, 1, implicit-def $eflags, debug-location !87; gmp.jl:451
  JCC_1 %bb.3, 5, implicit $eflags, debug-location !87; gmp.jl:451
  JMP_1 %bb.7, debug-location !87; gmp.jl:451

bb.3.L21:
; predecessors: %bb.2
  successors: %bb.4, %bb.7

  %62:gr64 = MOV64rm %0:gr64, 1, $noreg, 0, $noreg, debug-location !88 :: (load (s64) from %ir.x); Base.jl:37 @[ gmp.jl:451 ]
  %63:gr32 = MOV32rm %62:gr64, 1, $noreg, 4, $noreg, debug-location !88 :: (load (s32) from %ir.106, !tbaa !68, !alias.scope !72, !noalias !73, addrspace 11); Base.jl:37 @[ gmp.jl:451 ]
  %61:gr32 = SAR32ri %63:gr32(tied-def 0), 31, implicit-def $eflags, debug-location !89; int.jl:142 @[ int.jl:188 @[ gmp.jl:451 ] ]
  %60:gr32 = ADD32rr %63:gr32(tied-def 0), %61:gr32, implicit-def $eflags, debug-location !89; int.jl:142 @[ int.jl:188 @[ gmp.jl:451 ] ]
  %58:gr32 = XOR32rr %60:gr32(tied-def 0), %61:gr32, implicit-def $eflags, debug-location !89; int.jl:142 @[ int.jl:188 @[ gmp.jl:451 ] ]
  %55:gr64 = MOVSX64rr32 %58:gr32, debug-location !93; boot.jl:703 @[ boot.jl:784 @[ number.jl:7 @[ promotion.jl:358 @[ promotion.jl:381 @[ promotion.jl:450 @[ operators.jl:369 @[ gmp.jl:451 ] ] ] ] ] ] ]
  %52:gr64 = MOV32ri64 1, debug-location !103; int.jl:83 @[ promotion.jl:450 @[ operators.jl:369 @[ gmp.jl:451 ] ] ]
  CMP64rr %52:gr64, %55:gr64, implicit-def $eflags, debug-location !103; int.jl:83 @[ promotion.jl:450 @[ operators.jl:369 @[ gmp.jl:451 ] ] ]
  %51:gr8 = SETCCr 12, implicit $eflags, debug-location !103; int.jl:83 @[ promotion.jl:450 @[ operators.jl:369 @[ gmp.jl:451 ] ] ]
  %50:gr8 = AND8ri %51:gr8(tied-def 0), 1, implicit-def $eflags, debug-location !87; gmp.jl:451
  %47:gr8 = XOR8ri %50:gr8(tied-def 0), -1, implicit-def $eflags, debug-location !87; gmp.jl:451
  TEST8ri %47:gr8, 1, implicit-def $eflags, debug-location !87; gmp.jl:451
  JCC_1 %bb.4, 5, implicit $eflags, debug-location !87; gmp.jl:451
  JMP_1 %bb.7, debug-location !87; gmp.jl:451

bb.4.L27:
; predecessors: %bb.3
  successors: %bb.6, %bb.5

  %79:gr64_with_sub_8bit = ADD64ri8 %40:gr64(tied-def 0), -12, implicit-def dead $eflags, debug-location !104; int.jl:86 @[ gmp.jl:455 ]
  %80:gr8 = COPY %79.sub_8bit:gr64_with_sub_8bit, debug-location !114; int.jl:502 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %82:gr64 = IMPLICIT_DEF
  %81:gr64_with_sub_8bit = INSERT_SUBREG %82:gr64(tied-def 0), killed %80:gr8, %subreg.sub_8bit, debug-location !114; int.jl:502 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %83:gr64 = SHRX64rr %42:gr64, killed %81:gr64_with_sub_8bit, debug-location !114; int.jl:502 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %84:gr64 = MOV32ri64 12
  %85:gr64_with_sub_8bit = SUB64rr %84:gr64(tied-def 0), %40:gr64, implicit-def dead $eflags, debug-location !115; int.jl:85 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %86:gr8 = COPY %85.sub_8bit:gr64_with_sub_8bit, debug-location !116; int.jl:503 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %88:gr64 = IMPLICIT_DEF
  %87:gr64_with_sub_8bit = INSERT_SUBREG %88:gr64(tied-def 0), killed %86:gr8, %subreg.sub_8bit, debug-location !116; int.jl:503 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %89:gr64 = SHLX64rr %42:gr64, killed %87:gr64_with_sub_8bit, debug-location !116; int.jl:503 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %90:gr32 = MOV32r0 implicit-def dead $eflags
  %91:gr64 = SUB64ri8 %85:gr64_with_sub_8bit(tied-def 0), 64, implicit-def $eflags, debug-location !116; int.jl:503 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %92:gr32 = COPY %89.sub_32bit:gr64, debug-location !118; essentials.jl:575 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %93:gr32 = CMOV32rr %92:gr32(tied-def 0), %90:gr32, 3, implicit $eflags, debug-location !118; essentials.jl:575 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %94:gr64 = SUB64ri8 %79:gr64_with_sub_8bit(tied-def 0), 64, implicit-def $eflags, debug-location !114; int.jl:502 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %95:gr32 = COPY %83.sub_32bit:gr64, debug-location !118; essentials.jl:575 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %96:gr32 = CMOV32rr %95:gr32(tied-def 0), %90:gr32, 3, implicit $eflags, debug-location !118; essentials.jl:575 @[ int.jl:508 @[ gmp.jl:455 ] ]
  TEST64rr %79:gr64_with_sub_8bit, %79:gr64_with_sub_8bit, implicit-def $eflags, debug-location !106; int.jl:488 @[ int.jl:508 @[ gmp.jl:455 ] ]
  %97:gr32 = CMOV32rr %93:gr32(tied-def 0), killed %96:gr32, 9, implicit $eflags, debug-location !119; int.jl:518 @[ gmp.jl:455 ]
  %98:gr32 = INC32r %97:gr32(tied-def 0), implicit-def dead $eflags, debug-location !120; int.jl:87 @[ gmp.jl:456 ]
  %99:gr16 = COPY %98.sub_16bit:gr32, debug-location !120; int.jl:87 @[ gmp.jl:456 ]
  %100:gr32 = MOVZX32rr16 killed %99:gr16, debug-location !120; int.jl:87 @[ gmp.jl:456 ]
  %101:gr32 = SHR32r1 %100:gr32(tied-def 0), implicit-def dead $eflags, debug-location !123; int.jl:502 @[ int.jl:508 @[ gmp.jl:456 ] ]
  %5:gr16 = COPY %101.sub_16bit:gr32, debug-location !123; int.jl:502 @[ int.jl:508 @[ gmp.jl:456 ] ]
  %73:gr64 = MOV64rm %0:gr64, 1, $noreg, 0, $noreg, debug-location !127 :: (load (s64) from %ir.x); pointer.jl:155 @[ refvalue.jl:42 @[ refpointer.jl:101 @[ gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ] ] ] ]
  %78:gr64 = COPY %73:gr64, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  %102:gr64 = MOV64ri @jlplt___gmpz_scan1_876_got, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  %76:gr64 = MOV64rm killed %102:gr64, 1, $noreg, 0, $noreg, debug-location !130 :: (dereferenceable load unordered (s64) from @jlplt___gmpz_scan1_876_got); gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  MOV64mr %stack.1.gcframe, 1, $noreg, 24, $noreg, %78:gr64 :: (store (s64) into %ir.198)
  %74:gr32 = MOV32r0 implicit-def $eflags, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  %75:gr64 = SUBREG_TO_REG 0, %74:gr32, %subreg.sub_32bit, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  $rdi = COPY %73:gr64, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  $rsi = COPY %75:gr64, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  CALL64r %76:gr64, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit-def $rax, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  %77:gr64 = COPY $rax, debug-location !130; gmp.jl:191 @[ gmp.jl:566 @[ gmp.jl:457 ] ]
  CMP64ri8 %77:gr64, -1, implicit-def $eflags, debug-location !140; promotion.jl:499 @[ gmp.jl:567 @[ gmp.jl:457 ] ]
  %72:gr8 = SETCCr 4, implicit $eflags, debug-location !140; promotion.jl:499 @[ gmp.jl:567 @[ gmp.jl:457 ] ]
  %71:gr8 = AND8ri %72:gr8(tied-def 0), 1, implicit-def $eflags, debug-location !140; promotion.jl:499 @[ gmp.jl:567 @[ gmp.jl:457 ] ]
  %68:gr8 = XOR8ri %71:gr8(tied-def 0), -1, implicit-def $eflags, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  TEST8ri %68:gr8, 1, implicit-def $eflags, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  JCC_1 %bb.6, 5, implicit $eflags, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]

bb.5.L46:
; predecessors: %bb.4

  %105:gr64 = MOV64rm %0:gr64, 1, $noreg, 0, $noreg, debug-location !141 :: (load (s64) from %ir.x); gmp.jl:567 @[ gmp.jl:457 ]
  %106:gr64 = MOV64ri @"+Core.DomainError#367", debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %107:gr64 = MOV64rm killed %106:gr64, 1, $noreg, 0, $noreg, debug-location !141 :: (dereferenceable load (s64) from @"+Core.DomainError#367", !tbaa !32, !alias.scope !34, !noalias !37); gmp.jl:567 @[ gmp.jl:457 ]
  %108:gr64 = MOV64ri @"jl_global#368", debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %109:gr64 = MOV64rm killed %108:gr64, 1, $noreg, 0, $noreg, debug-location !141 :: (dereferenceable load (s64) from @"jl_global#368", !tbaa !32, !alias.scope !34, !noalias !37); gmp.jl:567 @[ gmp.jl:457 ]
  MOV64mr %stack.1.gcframe, 1, $noreg, 24, $noreg, %105:gr64 :: (store (s64) into %ir.218)
  MOV64mr %stack.0, 1, $noreg, 0, $noreg, %105:gr64, debug-location !141 :: (store (s64) into %ir.219); gmp.jl:567 @[ gmp.jl:457 ]
  MOV64mr %stack.0, 1, $noreg, 8, $noreg, killed %109:gr64, debug-location !141 :: (store (s64) into %ir.220); gmp.jl:567 @[ gmp.jl:457 ]
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %110:gr64 = MOV64ri @ijl_apply_generic, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %111:gr64 = LEA64r %stack.0, 1, $noreg, 0, $noreg
  %112:gr32 = MOV32ri 2
  $rdi = COPY %107:gr64, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  $rsi = COPY %111:gr64, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  $edx = COPY %112:gr32, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  CALL64r killed %110:gr64, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $edx, implicit-def $rsp, implicit-def $ssp, implicit-def $rax, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %113:gr64 = COPY $rax, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %103:gr64 = COPY %113:gr64, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  %104:gr64 = MOV64ri @ijl_throw, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  $rdi = COPY %103:gr64, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  CALL64r killed %104:gr64, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !141; gmp.jl:567 @[ gmp.jl:457 ]

bb.6.L50:
; predecessors: %bb.4
  successors: %bb.8

  %115:gr64 = ADD64ri8 %40:gr64(tied-def 0), -12, implicit-def dead $eflags, debug-location !143; int.jl:86 @[ gmp.jl:457 ]
  %116:gr64 = SUB64rr %77:gr64(tied-def 0), killed %115:gr64, implicit-def $eflags, debug-location !144; promotion.jl:499 @[ gmp.jl:457 ]
  %117:gr8 = SETCCr 4, implicit $eflags, debug-location !144; promotion.jl:499 @[ gmp.jl:457 ]
  %118:gr32 = MOVZX32rr8 killed %117:gr8, debug-location !145; boot.jl:744 @[ boot.jl:787 @[ gmp.jl:457 ] ]
  %120:gr32 = IMPLICIT_DEF debug-location !151; int.jl:347 @[ gmp.jl:457 ]
  %119:gr32 = INSERT_SUBREG %120:gr32(tied-def 0), %5:gr16, %subreg.sub_16bit, debug-location !151; int.jl:347 @[ gmp.jl:457 ]
  %121:gr32 = ANDN32rr killed %118:gr32, killed %119:gr32, implicit-def dead $eflags, debug-location !151; int.jl:347 @[ gmp.jl:457 ]
  %122:gr32 = COPY %40.sub_32bit:gr64, debug-location !155; int.jl:518 @[ gmp.jl:458 ]
  %123:gr32 = SHL32ri %122:gr32(tied-def 0), 10, implicit-def dead $eflags, debug-location !155; int.jl:518 @[ gmp.jl:458 ]
  %125:gr64 = IMPLICIT_DEF debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %124:gr64_nosp = INSERT_SUBREG %125:gr64(tied-def 0), killed %121:gr32, %subreg.sub_32bit, debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %127:gr64 = IMPLICIT_DEF debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %126:gr64 = INSERT_SUBREG %127:gr64(tied-def 0), killed %123:gr32, %subreg.sub_32bit, debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %128:gr32 = LEA64_32r killed %126:gr64, 1, killed %124:gr64_nosp, 13312, $noreg, debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %114:gr16 = COPY %128.sub_16bit:gr32, debug-location !160; int.jl:87 @[ gmp.jl:459 ]
  %130:gr32 = IMPLICIT_DEF debug-location !151; int.jl:347 @[ gmp.jl:457 ]
  %129:gr32 = INSERT_SUBREG %130:gr32(tied-def 0), %114:gr16, %subreg.sub_16bit, debug-location !162; essentials.jl:513 @[ gmp.jl:459 ]
  %131:vr128x = VMOVW2SHrr killed %129:gr32, debug-location !162; essentials.jl:513 @[ gmp.jl:459 ]
  %132:fr16x = COPY %131:vr128x, debug-location !162; essentials.jl:513 @[ gmp.jl:459 ]
  %7:fr16x = COPY %132:fr16x, debug-location !162; essentials.jl:513 @[ gmp.jl:459 ]
  JMP_1 %bb.8, debug-location !162; essentials.jl:513 @[ gmp.jl:459 ]

bb.7.L64:
; predecessors: %bb.2, %bb.3
  successors: %bb.8(0x80000000); %bb.8(100.00%)

  %65:gr64 = MOV64ri %const.0
  %64:fr16x = VMOVSHZrm_alt killed %65:gr64, 1, $noreg, 0, $noreg :: (load (s16) from constant-pool)
  JMP_1 %bb.8, debug-location !87; gmp.jl:451

bb.8.L65:
; predecessors: %bb.7, %bb.6
  successors: %bb.1

  %8:fr16x = PHI %64:fr16x, %bb.7, %7:fr16x, %bb.6
  %141:gr64 = MOV64rm %0:gr64, 1, $noreg, 0, $noreg, debug-location !163 :: (load (s64) from %ir.x); Base.jl:37 @[ gmp.jl:461 ]
  %142:gr64 = MOVSX64rm32 killed %141:gr64, 1, $noreg, 4, $noreg, debug-location !163 :: (dereferenceable load (s32) from %ir.295, !tbaa !68, !alias.scope !72, !noalias !73, addrspace 11); Base.jl:37 @[ gmp.jl:461 ]
  %143:gr64_with_sub_8bit = SHR64ri %142:gr64(tied-def 0), 63, implicit-def dead $eflags, debug-location !175; int.jl:83 @[ promotion.jl:450 @[ int.jl:139 @[ number.jl:205 @[ gmp.jl:461 ] ] ] ]
  %139:gr8 = COPY %143.sub_8bit:gr64_with_sub_8bit, debug-location !175; int.jl:83 @[ promotion.jl:450 @[ int.jl:139 @[ number.jl:205 @[ gmp.jl:461 ] ] ] ]
  %145:fr32x = IMPLICIT_DEF debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %144:fr32x = nofpexcept VCVTSH2SSZrr killed %145:fr32x, %8:fr16x, implicit $mxcsr, debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %146:vr128x = COPY %144:fr32x, debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %147:gr64 = MOV64ri %const.1
  %148:vr128x = VPBROADCASTDZ128rm killed %147:gr64, 1, $noreg, 0, $noreg, debug-location !176 :: (load (s32) from constant-pool); float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %149:vr128x = VPXORDZ128rr killed %146:vr128x, killed %148:vr128x, debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %150:fr32x = COPY %149:vr128x, debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %151:fr16x = IMPLICIT_DEF debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %134:fr16x = nofpexcept VCVTSS2SHZrr killed %151:fr16x, killed %150:fr32x, implicit $mxcsr, debug-location !176; float.jl:406 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %140:gr8 = AND8ri %139:gr8(tied-def 0), 1, implicit-def $eflags, debug-location !179; essentials.jl:575 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %137:gr8 = XOR8ri %140:gr8(tied-def 0), -1, implicit-def $eflags, debug-location !179; essentials.jl:575 @[ number.jl:205 @[ gmp.jl:461 ] ]
  TEST8ri %137:gr8, 1, implicit-def $eflags, debug-location !179; essentials.jl:575 @[ number.jl:205 @[ gmp.jl:461 ] ]
  %135:fr16x = CMOV_FR16X %134:fr16x, %8:fr16x, 5, implicit $eflags, debug-location !179; essentials.jl:575 @[ number.jl:205 @[ gmp.jl:461 ] ]
  JMP_1 %bb.1

# End machine code for function julia_Float16_1069.
BioTurboNick commented 11 months ago

Ah yeah, there's a problem with 16-bit floats on Sapphire Rapids and Julia 1.9. No crash in this case, but the result is wrong.

# Julia 1.9.3 on Sapphire Rapids:
julia> Float16(BigInt(4))
Float16(0.0)
julia> Float32(BigInt(4))
4.0f0
julia> Inf16
Float16(0.0)
julia> @info Inf16 # or any Float16
ERROR: BoundsError: attempt to access 23-element Vector{UInt8} at index [24]
Stacktrace:
 [1] setindex!
   @ ./array.jl:969 [inlined]
 [2] writeshortest(buf::Vector{UInt8}, pos::Int64, x::Float16, plus::Bool, space::Bool, hash::Bool, precision::Int64, expchar::UInt8, padexp::Bool, decchar::UInt8, typed::Bool, compact::Bool)
   @ Base.Ryu ./ryu/shortest.jl:267
 [3] string(x::Float16)
   @ Base.Ryu ./ryu/Ryu.jl:123
 [4] handle_message(logger::Logging.ConsoleLogger, level::Base.CoreLogging.LogLevel, message::Any, _module::Any, group::Any, id::Any, filepath::Any, line::Any; kwargs::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
   @ Logging ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/Logging/src/ConsoleLogger.jl:119
 [5] handle_message(logger::Logging.ConsoleLogger, level::Base.CoreLogging.LogLevel, message::Any, _module::Any, group::Any, id::Any, filepath::Any, line::Any)
   @ Logging ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/Logging/src/ConsoleLogger.jl:106
 [6] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./essentials.jl:819
 [7] invokelatest(::Any, ::Any, ::Vararg{Any})
   @ Base ./essentials.jl:816
 [8] top-level scope
   @ logging.jl:353

# Julia 1.9.3 on Raptor Lake:
julia> Float16(BigInt(4))
Float16(4.0)
julia> Float32(BigInt(4))
4.0f0
julia> Inf16
Inf16
julia> @info Inf16
[ Info: Inf
BioTurboNick commented 11 months ago

I've narrowed it down to a MWE:

module testproj

using Parsers
using PrecompileTools

@setup_workload begin
    @compile_workload begin
        Parsers.xparse(Float16, "3.14", 1, 4, Parsers.Options())
    end
end

end # module

If this package is precompiled, the segfault occurs. But if the function is used on the REPL, everything is fine.

BioTurboNick commented 11 months ago

Parsers.jl does not fail to precompile here, but it looks like it doesn't bother precompiling Float16 methods: https://github.com/JuliaData/Parsers.jl/blob/main/src/precompile.jl

BioTurboNick commented 11 months ago

Traced it down even further to this function: https://github.com/JuliaData/Parsers.jl/blob/d1c6fc58d53ffe6800187992d284563cf9c2122c/src/floats.jl#L340-L340

        options = Parsers.Options()
        buf = codeunits("3.14")
        code = Parsers.SUCCESS
        conf = Parsers.conf(Float16, options)
        b = Parsers.peekbyte(buf, 3)

        Parsers.parsefrac(conf, buf, 3, 4, b, code, options, 0x0000000000000003, false, 1, UInt64(0), false, 1, nothing)

(I'll keep narrowing it down, but need a break :-) )

BioTurboNick commented 11 months ago

Alright, I'm about as minimal as I can get now:

module testproj

function __fooscale(v, neg)
    if v < 2048
        x = Float16(v) / Float16(100.0)
        return ifelse(neg, -x, x)
    end
    return __fooscale(v, neg)
end

__fooscale(UInt128(0x013a), false)

end # module

This reliably triggers the crash. If I remove the branch, or the division by 100, or change the ifelse to a conditional assignment, or remove the recursive call, it precompiles successfully.

On the REPL, this function works fine, and outputs Float16(3.14). (with #51700 changes)

EDIT: removed PrecompileTools bit, it was dispensible.

BioTurboNick commented 11 months ago

I managed to capture an rr trace when precompiling CSV via Pkg.precompile(): https://julialang-dumps.s3.amazonaws.com/reports/2023-10-20T23-47-50-BioTurboNick.tar.zst

Also during using instead (just for completeness): https://julialang-dumps.s3.amazonaws.com/reports/2023-10-20T23-53-00-BioTurboNick.tar.zst

And finally, a trace using my minimal example and using: https://julialang-dumps.s3.amazonaws.com/reports/2023-10-20T23-58-26-BioTurboNick.tar.zst

BioTurboNick commented 11 months ago

Oh, but precompilation happens in a separate process, right? So these may not be useful. I'm not seeing relevant stack frames, and it skips over the error.

EDIT: I tried hacking the precompilation CLI command to include rr, but I don't think it worked right - I didn't get the segfault in the replay.

BioTurboNick commented 11 months ago

This is the last LLVM pass output on the MWE prior to the segfault:

*** IR Dump After Safe Stack instrumentation pass (safe-stack) ***
define half @julia___fooscale_5(i128 zeroext %0, i8 zeroext %1) #0 !dbg !5 {
top:
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"()
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %2 = bitcast {}*** %pgcstack to {}**
  %current_task = getelementptr inbounds {}*, {}** %2, i64 -13
  %3 = bitcast {}** %current_task to i64*
  %world_age = getelementptr inbounds i64, i64* %3, i64 14
  %4 = load {}*, {}** @"*Core.Intrinsics.ult_int#2", align 8, !dbg !7, !tbaa !15, !alias.scope !19, !noalias !22
  %5 = bitcast {}* %4 to {} addrspace(10)**, !dbg !7
  %6 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %5, i64 1, !dbg !7
  %7 = icmp ult i128 %0, 2048, !dbg !7
  %8 = load {}*, {}** @"*Core.Intrinsics.and_int#3", align 8, !dbg !27, !tbaa !15, !alias.scope !19, !noalias !22
  %9 = bitcast {}* %8 to {} addrspace(10)**, !dbg !27
  %10 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %9, i64 1, !dbg !27
  %11 = and i1 true, %7, !dbg !27
  %12 = zext i1 %11 to i8, !dbg !14
  %13 = trunc i8 %12 to i1, !dbg !14
  %14 = xor i1 %13, true, !dbg !14
  br i1 %14, label %L49, label %L4, !dbg !14

L4:                                               ; preds = %top
  %15 = load {}*, {}** @"*Core.Intrinsics.ult_int#2", align 8, !dbg !30, !tbaa !15, !alias.scope !19, !noalias !22
  %16 = bitcast {}* %15 to {} addrspace(10)**, !dbg !30
  %17 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %16, i64 1, !dbg !30
  %18 = icmp ult i128 %0, 20282409603651670423947251286016, !dbg !30
  %19 = zext i1 %18 to i8, !dbg !31
  %20 = trunc i8 %19 to i1, !dbg !31
  %21 = xor i1 %20, true, !dbg !31
  br i1 %21, label %L20, label %L6, !dbg !31

L6:                                               ; preds = %L4
  %22 = load {}*, {}** @"*Core.Intrinsics.trunc_int#4", align 8, !dbg !37, !tbaa !15, !alias.scope !19, !noalias !22
  %23 = bitcast {}* %22 to {} addrspace(10)**, !dbg !37
  %24 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %23, i64 1, !dbg !37
  %25 = trunc i128 %0 to i64, !dbg !37
  %26 = load {}*, {}** @"*Core.Intrinsics.and_int#3", align 8, !dbg !40, !tbaa !15, !alias.scope !19, !noalias !22
  %27 = bitcast {}* %26 to {} addrspace(10)**, !dbg !40
  %28 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %27, i64 1, !dbg !40
  %29 = and i64 %25, 4503599627370495, !dbg !40
  %30 = load {}*, {}** @"*Core.Intrinsics.or_int#5", align 8, !dbg !42, !tbaa !15, !alias.scope !19, !noalias !22
  %31 = bitcast {}* %30 to {} addrspace(10)**, !dbg !42
  %32 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %31, i64 1, !dbg !42
  %33 = or i64 4841369599423283200, %29, !dbg !42
  %34 = load {}*, {}** @"*Core.Intrinsics.bitcast#6", align 8, !dbg !45, !tbaa !15, !alias.scope !19, !noalias !22
  %35 = bitcast {}* %34 to {} addrspace(10)**, !dbg !45
  %36 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %35, i64 1, !dbg !45
  %37 = bitcast i64 %33 to double, !dbg !45
  %38 = load {}*, {}** @"*Core.Intrinsics.sub_float#7", align 8, !dbg !48, !tbaa !15, !alias.scope !19, !noalias !22
  %39 = bitcast {}* %38 to {} addrspace(10)**, !dbg !48
  %40 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %39, i64 1, !dbg !48
  %41 = fsub double %37, 0x4330000000000000, !dbg !48
  %42 = load {}*, {}** @"*Core.Intrinsics.lshr_int#8", align 8, !dbg !50, !tbaa !15, !alias.scope !19, !noalias !22
  %43 = bitcast {}* %42 to {} addrspace(10)**, !dbg !50
  %44 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %43, i64 1, !dbg !50
  %45 = lshr i128 %0, 52, !dbg !50
  %46 = select i1 false, i128 0, i128 %45, !dbg !50
  %47 = load {}*, {}** @"*Core.Intrinsics.shl_int#9", align 8, !dbg !54, !tbaa !15, !alias.scope !19, !noalias !22
  %48 = bitcast {}* %47 to {} addrspace(10)**, !dbg !54
  %49 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %48, i64 1, !dbg !54
  %50 = shl i128 %0, 18446744073709551564, !dbg !54
  %51 = select i1 true, i128 0, i128 %50, !dbg !54
  %52 = load {}*, {}** @"*Core.ifelse#10", align 8, !dbg !56, !tbaa !15, !alias.scope !19, !noalias !22
  %53 = bitcast {}* %52 to {} addrspace(10)**, !dbg !56
  %54 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %53, i64 1, !dbg !56
  %55 = select i1 false, i128 %51, i128 %46, !dbg !56
  %56 = load {}*, {}** @"*Core.Intrinsics.trunc_int#4", align 8, !dbg !58, !tbaa !15, !alias.scope !19, !noalias !22
  %57 = bitcast {}* %56 to {} addrspace(10)**, !dbg !58
  %58 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %57, i64 1, !dbg !58
  %59 = trunc i128 %55 to i64, !dbg !58
  %60 = load {}*, {}** @"*Core.Intrinsics.or_int#5", align 8, !dbg !59, !tbaa !15, !alias.scope !19, !noalias !22
  %61 = bitcast {}* %60 to {} addrspace(10)**, !dbg !59
  %62 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %61, i64 1, !dbg !59
  %63 = or i64 5075556780046548992, %59, !dbg !59
  %64 = load {}*, {}** @"*Core.Intrinsics.bitcast#6", align 8, !dbg !61, !tbaa !15, !alias.scope !19, !noalias !22
  %65 = bitcast {}* %64 to {} addrspace(10)**, !dbg !61
  %66 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %65, i64 1, !dbg !61
  %67 = bitcast i64 %63 to double, !dbg !61
  %68 = load {}*, {}** @"*Core.Intrinsics.sub_float#7", align 8, !dbg !62, !tbaa !15, !alias.scope !19, !noalias !22
  %69 = bitcast {}* %68 to {} addrspace(10)**, !dbg !62
  %70 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %69, i64 1, !dbg !62
  %71 = fsub double %67, 0x4670000000000000, !dbg !62
  %72 = load {}*, {}** @"*Core.Intrinsics.add_float#11", align 8, !dbg !63, !tbaa !15, !alias.scope !19, !noalias !22
  %73 = bitcast {}* %72 to {} addrspace(10)**, !dbg !63
  %74 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %73, i64 1, !dbg !63
  %75 = fadd double %41, %71, !dbg !63
  br label %L42, !dbg !65

L20:                                              ; preds = %L4
  %76 = load {}*, {}** @"*Core.Intrinsics.lshr_int#8", align 8, !dbg !66, !tbaa !15, !alias.scope !19, !noalias !22
  %77 = bitcast {}* %76 to {} addrspace(10)**, !dbg !66
  %78 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %77, i64 1, !dbg !66
  %79 = lshr i128 %0, 12, !dbg !66
  %80 = select i1 false, i128 0, i128 %79, !dbg !66
  %81 = load {}*, {}** @"*Core.Intrinsics.shl_int#9", align 8, !dbg !69, !tbaa !15, !alias.scope !19, !noalias !22
  %82 = bitcast {}* %81 to {} addrspace(10)**, !dbg !69
  %83 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %82, i64 1, !dbg !69
  %84 = shl i128 %0, 18446744073709551604, !dbg !69
  %85 = select i1 true, i128 0, i128 %84, !dbg !69
  %86 = load {}*, {}** @"*Core.ifelse#10", align 8, !dbg !70, !tbaa !15, !alias.scope !19, !noalias !22
  %87 = bitcast {}* %86 to {} addrspace(10)**, !dbg !70
  %88 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %87, i64 1, !dbg !70
  %89 = select i1 false, i128 %85, i128 %80, !dbg !70
  %90 = load {}*, {}** @"*Core.Intrinsics.trunc_int#4", align 8, !dbg !71, !tbaa !15, !alias.scope !19, !noalias !22
  %91 = bitcast {}* %90 to {} addrspace(10)**, !dbg !71
  %92 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %91, i64 1, !dbg !71
  %93 = trunc i128 %89 to i64, !dbg !71
  %94 = load {}*, {}** @"*Core.Intrinsics.lshr_int#8", align 8, !dbg !66, !tbaa !15, !alias.scope !19, !noalias !22
  %95 = bitcast {}* %94 to {} addrspace(10)**, !dbg !66
  %96 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %95, i64 1, !dbg !66
  %97 = lshr i64 %93, 12, !dbg !66
  %98 = select i1 false, i64 0, i64 %97, !dbg !66
  %99 = load {}*, {}** @"*Core.Intrinsics.shl_int#9", align 8, !dbg !69, !tbaa !15, !alias.scope !19, !noalias !22
  %100 = bitcast {}* %99 to {} addrspace(10)**, !dbg !69
  %101 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %100, i64 1, !dbg !69
  %102 = shl i64 %93, -12, !dbg !69
  %103 = select i1 true, i64 0, i64 %102, !dbg !69
  %104 = load {}*, {}** @"*Core.ifelse#10", align 8, !dbg !70, !tbaa !15, !alias.scope !19, !noalias !22
  %105 = bitcast {}* %104 to {} addrspace(10)**, !dbg !70
  %106 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %105, i64 1, !dbg !70
  %107 = select i1 false, i64 %103, i64 %98, !dbg !70
  %108 = load {}*, {}** @"*Core.Intrinsics.trunc_int#4", align 8, !dbg !71, !tbaa !15, !alias.scope !19, !noalias !22
  %109 = bitcast {}* %108 to {} addrspace(10)**, !dbg !71
  %110 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %109, i64 1, !dbg !71
  %111 = trunc i128 %0 to i64, !dbg !71
  %112 = load {}*, {}** @"*Core.Intrinsics.and_int#3", align 8, !dbg !72, !tbaa !15, !alias.scope !19, !noalias !22
  %113 = bitcast {}* %112 to {} addrspace(10)**, !dbg !72
  %114 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %113, i64 1, !dbg !72
  %115 = and i64 %111, 16777215, !dbg !72
  %116 = load {}*, {}** @"*Core.Intrinsics.or_int#5", align 8, !dbg !74, !tbaa !15, !alias.scope !19, !noalias !22
  %117 = bitcast {}* %116 to {} addrspace(10)**, !dbg !74
  %118 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %117, i64 1, !dbg !74
  %119 = or i64 %107, %115, !dbg !74
  %120 = load {}*, {}** @"*Core.Intrinsics.or_int#5", align 8, !dbg !75, !tbaa !15, !alias.scope !19, !noalias !22
  %121 = bitcast {}* %120 to {} addrspace(10)**, !dbg !75
  %122 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %121, i64 1, !dbg !75
  %123 = or i64 4949455990480175104, %119, !dbg !75
  %124 = load {}*, {}** @"*Core.Intrinsics.bitcast#6", align 8, !dbg !77, !tbaa !15, !alias.scope !19, !noalias !22
  %125 = bitcast {}* %124 to {} addrspace(10)**, !dbg !77
  %126 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %125, i64 1, !dbg !77
  %127 = bitcast i64 %123 to double, !dbg !77
  %128 = load {}*, {}** @"*Core.Intrinsics.sub_float#7", align 8, !dbg !78, !tbaa !15, !alias.scope !19, !noalias !22
  %129 = bitcast {}* %128 to {} addrspace(10)**, !dbg !78
  %130 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %129, i64 1, !dbg !78
  %131 = fsub double %127, 0x44B0000000000000, !dbg !78
  %132 = load {}*, {}** @"*Core.Intrinsics.lshr_int#8", align 8, !dbg !79, !tbaa !15, !alias.scope !19, !noalias !22
  %133 = bitcast {}* %132 to {} addrspace(10)**, !dbg !79
  %134 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %133, i64 1, !dbg !79
  %135 = lshr i128 %0, 76, !dbg !79
  %136 = select i1 false, i128 0, i128 %135, !dbg !79
  %137 = load {}*, {}** @"*Core.Intrinsics.shl_int#9", align 8, !dbg !82, !tbaa !15, !alias.scope !19, !noalias !22
  %138 = bitcast {}* %137 to {} addrspace(10)**, !dbg !82
  %139 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %138, i64 1, !dbg !82
  %140 = shl i128 %0, 18446744073709551540, !dbg !82
  %141 = select i1 true, i128 0, i128 %140, !dbg !82
  %142 = load {}*, {}** @"*Core.ifelse#10", align 8, !dbg !83, !tbaa !15, !alias.scope !19, !noalias !22
  %143 = bitcast {}* %142 to {} addrspace(10)**, !dbg !83
  %144 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %143, i64 1, !dbg !83
  %145 = select i1 false, i128 %141, i128 %136, !dbg !83
  %146 = load {}*, {}** @"*Core.Intrinsics.trunc_int#4", align 8, !dbg !84, !tbaa !15, !alias.scope !19, !noalias !22
  %147 = bitcast {}* %146 to {} addrspace(10)**, !dbg !84
  %148 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %147, i64 1, !dbg !84
  %149 = trunc i128 %145 to i64, !dbg !84
  %150 = load {}*, {}** @"*Core.Intrinsics.or_int#5", align 8, !dbg !85, !tbaa !15, !alias.scope !19, !noalias !22
  %151 = bitcast {}* %150 to {} addrspace(10)**, !dbg !85
  %152 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %151, i64 1, !dbg !85
  %153 = or i64 5183643171103440896, %149, !dbg !85
  %154 = load {}*, {}** @"*Core.Intrinsics.bitcast#6", align 8, !dbg !87, !tbaa !15, !alias.scope !19, !noalias !22
  %155 = bitcast {}* %154 to {} addrspace(10)**, !dbg !87
  %156 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %155, i64 1, !dbg !87
  %157 = bitcast i64 %153 to double, !dbg !87
  %158 = load {}*, {}** @"*Core.Intrinsics.sub_float#7", align 8, !dbg !88, !tbaa !15, !alias.scope !19, !noalias !22
  %159 = bitcast {}* %158 to {} addrspace(10)**, !dbg !88
  %160 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %159, i64 1, !dbg !88
  %161 = fsub double %157, 0x47F0000000000000, !dbg !88
  %162 = load {}*, {}** @"*Core.Intrinsics.add_float#11", align 8, !dbg !89, !tbaa !15, !alias.scope !19, !noalias !22
  %163 = bitcast {}* %162 to {} addrspace(10)**, !dbg !89
  %164 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %163, i64 1, !dbg !89
  %165 = fadd double %131, %161, !dbg !89
  br label %L42, !dbg !90

L42:                                              ; preds = %L20, %L6
  %value_phi = phi double [ %75, %L6 ], [ %165, %L20 ]
  %166 = load {}*, {}** @"*Core.Intrinsics.fptrunc#12", align 8, !dbg !91, !tbaa !15, !alias.scope !19, !noalias !22
  %167 = bitcast {}* %166 to {} addrspace(10)**, !dbg !91
  %168 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %167, i64 1, !dbg !91
  %169 = load {}*, {}** @"*Core.Float16#13", align 8, !dbg !91, !tbaa !15, !alias.scope !19, !noalias !22
  %170 = bitcast {}* %169 to {} addrspace(10)**, !dbg !91
  %171 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %170, i64 1, !dbg !91
  %172 = fptrunc double %value_phi to half, !dbg !91
  %173 = load {}*, {}** @"*Core.Intrinsics.div_float#14", align 8, !dbg !95, !tbaa !15, !alias.scope !19, !noalias !22
  %174 = bitcast {}* %173 to {} addrspace(10)**, !dbg !95
  %175 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %174, i64 1, !dbg !95
  %176 = fpext half %172 to float, !dbg !95
  %177 = fdiv float %176, 1.000000e+02, !dbg !95
  %178 = fptrunc float %177 to half, !dbg !95
  %179 = load {}*, {}** @"*Core.Intrinsics.neg_float#15", align 8, !dbg !97, !tbaa !15, !alias.scope !19, !noalias !22
  %180 = bitcast {}* %179 to {} addrspace(10)**, !dbg !97
  %181 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %180, i64 1, !dbg !97
  %182 = fpext half %178 to float, !dbg !97
  %183 = fneg float %182, !dbg !97
  %184 = fptrunc float %183 to half, !dbg !97
  %185 = load {}*, {}** @"*Core.ifelse#10", align 8, !dbg !99, !tbaa !15, !alias.scope !19, !noalias !22
  %186 = bitcast {}* %185 to {} addrspace(10)**, !dbg !99
  %187 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %186, i64 1, !dbg !99
  %188 = trunc i8 %1 to i1, !dbg !99
  %189 = xor i1 %188, true, !dbg !99
  %190 = select i1 %189, half %178, half %184, !dbg !99
  br label %common.ret

common.ret:                                       ; preds = %L49, %L42
  %common.ret.op = phi half [ %190, %L42 ], [ %194, %L49 ]
  ret half %common.ret.op, !dbg !100

L49:                                              ; preds = %top
  %191 = load {}*, {}** @"*testproj.__fooscale#16", align 8, !dbg !101, !tbaa !15, !alias.scope !19, !noalias !22
  %192 = bitcast {}* %191 to {} addrspace(10)**, !dbg !101
  %193 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)** %192, i64 1, !dbg !101
  %194 = call half @julia___fooscale_5(i128 zeroext %0, i8 zeroext %1) #0, !dbg !101
  br label %common.ret
BioTurboNick commented 9 months ago

Is there anything more I can do here?

vchuravy commented 9 months ago

Is there anything more I can do here?

Thank you for all the detective work you have done here.

The patch you identified is a bit to large for comfort. So the question is, can we disable a cpu feature to get LLVM to stop handling FP16 as native on Sapphire Rapid?

Maybe try: julia -C sapphirerapids,-avx512fp16 That we could add as a patch to 1.9

BioTurboNick commented 9 months ago

Actually yeah, that does fix it.

vchuravy commented 9 months ago

And to confirm 1.10 works fine?

vchuravy commented 9 months ago

Can you give #52349 a try?

BioTurboNick commented 9 months ago

Confirmed that 1.10 works fine. That PR doesn't fix it though. FWIW the patch I identified above didn't fix this crash, it only fixed display of Float16 values, in case that helps.

BioTurboNick commented 3 months ago

Is 1.9.x done? Should this issue be closed as not planned?