codegen: crash / incorrect answer with 128 threads #36

Open jzxia opened 3 years ago

jzxia commented 3 years ago

The code generated by src/codegen/broutine.jl crashes or produces wrong answer with 128 threads. Note that these errors have not occurred so far with <= 64 threads.

In particular, I tested on a computer running Ubuntu 20.04.2 LTS whose hardware topology is as follows:

julia> using Hwloc

julia> topology_info()
Machine: 1 (503.78 GB)
 Package: 2 (251.81 GB)
  Group: 8 (62.87 GB)
   NUMANode: 8 (62.87 GB)
    L3Cache: 32 (16.0 MB)
     L2Cache: 128 (512.0 kB)
      L1Cache: 128 (32.0 kB)
       Core: 128
        PU: 256

The following code (or a slight variation of it) is used to perform the test:

using Test
using BenchmarkTools
using LinearAlgebra
using BQCESubroutine
using YaoLocations


@testset "N=$N" for N in [15, 20]
        st = rand(Float64, 1<<N);
        loc = 1
        locs = BQCESubroutine.Locations(loc);
        st0 = broutine!(copy(st), Val(:X), locs);
        st1 = broutine!(copy(st), [0 1; 1 0], locs);
    println("|err| = ", norm(st0-st1))
        @test st0 ≈ st1

I did the test for the following cases:

where "old codegen" refers to the case where the following lines of src/codegen/broutine.jl are commented out (so that bsubspace is used); while "new codegen" refers to the case where the following lines are retained (so that threaded_subspace_loop_2x2_nontrivial is called). https://github.com/QuantumBFS/BQCESubroutine.jl/blob/4fef470be3df0170d7c6358c95fce2c0f0fbb721/src/codegen/broutine.jl#L484-L487

The test results are as follows. The errors occur in about 1/3 of all trials. Also, I haven't seen any errors so far with <=64 threads.

(base) visitor@delta106:~/julia_xjz/BQCESubroutine.jl$ julia --project=@.
julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> using BQCESubroutine: threaded_basic_broutine!

julia> @testset "N=$N" for N in [15, 20]
           #@testset "i=$i" for i in 1:N
           #for i in 1:N
           #for j in 1:1000
           for i in 1:1
               st = rand(Float64, 1<<N);
               locs = BQCESubroutine.Locations(i);
               st0 = broutine!(copy(st), Val(:X), locs);
               st1 = broutine!(copy(st), [0 1; 1 0], locs);
               println("|err| = ", norm(st0-st1))
               @test st0 ≈ st1

signal (11): Segmentation fault
in expression starting at REPL[7]:1
unsafe_load at ./pointer.jl:105 [inlined]
unsafe_load at ./pointer.jl:105 [inlined]
macro expansion at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
pload at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
getindex at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:331 [inlined]
macro expansion at /home/visitor/julia_xjz/BQCESubroutine.jl/src/codegen/broutine.jl:315 [inlined]
#90 at /home/visitor/.julia/packages/Polyester/7cr0U/src/closure.jl:223 [inlined]
BatchClosure at /home/visitor/.julia/packages/Polyester/7cr0U/src/batch.jl:8
unknown function (ip: 0x7f0c580868f0)
_call at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:11 [inlined]
ThreadTask at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:29
unknown function (ip: 0x7f0c5808d9cc)

signal (11): Segmentation fault
in expression starting at REPL[7]:1
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
unsafe_load at ./pointer.jl:105 [inlined]
unsafe_load at ./pointer.jl:105 [inlined]
macro expansion at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
pload at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:177 [inlined]
getindex at /home/visitor/.julia/packages/StrideArraysCore/skpQT/src/ptr_array.jl:331 [inlined]
macro expansion at /home/visitor/julia_xjz/BQCESubroutine.jl/src/codegen/broutine.jl:315 [inlined]
#90 at /home/visitor/.julia/packages/Polyester/7cr0U/src/closure.jl:223 [inlined]
BatchClosure at /home/visitor/.julia/packages/Polyester/7cr0U/src/batch.jl:8
unknown function (ip: 0x7f0c580868f0)
_call at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:11 [inlined]
ThreadTask at /home/visitor/.julia/packages/ThreadingUtilities/IkkvN/src/threadtasks.jl:29
unknown function (ip: 0x7f0c5808d9cc)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 10465430 (Pool: 10461946; Big: 3484); GC: 10
unknown function (ip: (nil))
Allocations: 10465430 (Pool: 10461946; Big: 3484); GC: 10
Segmentation fault (core dumped)

julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> using BQCESubroutine: threaded_basic_broutine!

julia> Threads.nthreads() 128

julia> @testset "N=$N" for N in [15, 20]

@testset "i=$i" for i in 1:N

       #for i in 1:N
       #for j in 1:1000
       for i in 1:1
           st = rand(Float64, 1<<N);
           locs = BQCESubroutine.Locations(i);
           st0 = broutine!(copy(st), Val(:X), locs);
           st1 = broutine!(copy(st), [0 1; 1 0], locs);
           println("|err| = ", norm(st0-st1))
           @test st0 ≈ st1

|err| = 9.591304821338616 N=15: Test Failed at REPL[8]:11 Expression: st0 ≈ st1 Evaluated: [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747 … 0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366] ≈ [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747 … 0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366] Stacktrace: [1] macro expansion @ ./REPL[8]:11 [inlined] [2] top-level scope @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1226 [inlined] [3] top-level scope @ ./REPL[8]:0 [4] eval @ ./boot.jl:360 [inlined] [5] eval_user_input(ast::Any, backend::REPL.REPLBackend) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139 [6] repl_backend_loop(backend::REPL.REPLBackend) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200 [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185 [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317 [9] run_repl(repl::REPL.AbstractREPL, consumer::Any) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305 [10] (::Base.var"#874#876"{Bool, Bool, Bool})(REPL::Module) @ Base ./client.jl:387 [11] #invokelatest#2 @ ./essentials.jl:708 [inlined] [12] invokelatest @ ./essentials.jl:706 [inlined] [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool) @ Base ./client.jl:372 [14] exec_options(opts::Base.JLOptions) @ Base ./client.jl:302 [15] _start() @ Base ./client.jl:485 Test Summary: | Fail Total N=15 | 1 1 Test Summary: | Fail Total N=15 | 1 1 ERROR: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

caused by: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.


- (crash) `loc=N`, old `codegen` using `Threads.@threads`

... signal (11): Segmentation fault in expression starting at REPL[8]:1 unsafe_load at ./pointer.jl:105 [inlined] unsafe_load at ./pointer.jl:105 [inlined] ...

- (crash) `loc=N`, new `codegen`  using `Threads.@threads`

- (crash) `loc=N`, new `codegen`  using `@batch` from Polyester

- (incorrect answer) `loc=N`, new `codegen`  using `@batch` from Polyester
(base) visitor@delta106:~/julia_xjz/BQCESubroutine.jl$ julia --project=@.
julia> using BQCESubroutine
[ Info: Precompiling BQCESubroutine [29e2bfda-5ba7-471c-9125-afac425f1f80]

julia> using Test

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> using BQCESubroutine

julia> using YaoLocations

julia> Threads.nthreads()

julia> @testset "N=$N" for N in [15, 20]
               st = rand(Float64, 1<<N);
               locs = BQCESubroutine.Locations(N);
               st0 = broutine!(copy(st), Val(:X), locs);
               st1 = broutine!(copy(st), [0 1; 1 0], locs);
               println("|err| = ", norm(st0-st1))
               @test st0 ≈ st1
|err| = 16.047140931650585
N=15: Test Failed at REPL[8]:7
  Expression: st0 ≈ st1
   Evaluated: [0.5331964447622937, 0.6840490894483715, 0.2992315961195635, 0.2788357425851684, 0.8245955857174441, 0.34661593647558275, 0.13788131297975648, 0.4132599933839103, 0.10438664295039812, 0.6052680657151797  …  0.01005720357114237, 0.40938335588275665, 0.13120408445874276, 0.21412778340666128, 0.23683502279509216, 0.4887433118091513, 0.43142024877557206, 0.4821280787877209, 0.5761057194395589, 0.7531886577130373] ≈ [0.5331964447622937, 0.6840490894483715, 0.2992315961195635, 0.2788357425851684, 0.8245955857174441, 0.34661593647558275, 0.13788131297975648, 0.4132599933839103, 0.10438664295039812, 0.6052680657151797  …  0.01005720357114237, 0.40938335588275665, 0.13120408445874276, 0.21412778340666128, 0.23683502279509216, 0.4887433118091513, 0.43142024877557206, 0.4821280787877209, 0.5761057194395589, 0.7531886577130373]
  [1] macro expansion
    @ ./REPL[8]:7 [inlined]
  [2] top-level scope
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1226 [inlined]
  [3] top-level scope
    @ ./REPL[8]:0
  [4] eval
    @ ./boot.jl:360 [inlined]
  [5] eval_user_input(ast::Any, backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
  [6] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
  [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
  [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
  [9] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
 [10] (::Base.var"#874#876"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:387
 [11] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [12] invokelatest
    @ ./essentials.jl:706 [inlined]
 [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:372
 [14] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:302
 [15] _start()
    @ Base ./client.jl:485
Test Summary: | Fail  Total
N=15          |    1      1
Test Summary: | Fail  Total
N=15          |    1      1
ERROR: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

caused by: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.

jzxia commented 3 years ago

What I've tried so far:


Possibly related issue: https://github.com/JuliaLang/julia/issues/14857

Roger-luo commented 3 years ago

can we try to reduce the MWE by copying out one segment fault code generated from codegen? The current code is too complicated yo address this issue