Open jzxia opened 3 years ago
What I've tried so far:
fastmath
and inbounds
(change both options to false
), but the same error occurs:add @assert 0 <= $m < (1 << $(ctx.hoisted_vars.nqubits))
before calling kernel
, but same error occurs, without triggering the assert
.
https://github.com/QuantumBFS/BQCESubroutine.jl/blob/4fef470be3df0170d7c6358c95fce2c0f0fbb721/src/codegen/broutine.jl#L538-L540
test with nthreads=1
to 64
(https://github.com/QuantumBFS/BQCESubroutine.jl/pull/35/commits/929304eb43542208ee484c0519b1bd0072b588ef)
all tests passed
Possibly related issue: https://github.com/JuliaLang/julia/issues/14857
can we try to reduce the MWE by copying out one segment fault code generated from codegen? The current code is too complicated yo address this issue
The code generated by
src/codegen/broutine.jl
crashes or produces wrong answer with 128 threads. Note that these errors have not occurred so far with <= 64 threads.In particular, I tested on a computer running Ubuntu 20.04.2 LTS whose hardware topology is as follows:
The following code (or a slight variation of it) is used to perform the test:
I did the test for the following cases:
loc=1
, oldcodegen
usingThreads.@threads
loc=N
, oldcodegen
usingThreads.@threads
loc=N
, newcodegen
using@batch
from Polyesterloc=N
, newcodegen
usingThreads.@threads
where "old
codegen
" refers to the case where the following lines ofsrc/codegen/broutine.jl
are commented out (so thatbsubspace
is used); while "newcodegen
" refers to the case where the following lines are retained (so thatthreaded_subspace_loop_2x2_nontrivial
is called). https://github.com/QuantumBFS/BQCESubroutine.jl/blob/4fef470be3df0170d7c6358c95fce2c0f0fbb721/src/codegen/broutine.jl#L484-L487The test results are as follows. The errors occur in about 1/3 of all trials. Also, I haven't seen any errors so far with <=64 threads.
loc=1
, oldcodegen
usingThreads.@threads
loc=1
, oldcodegen
usingThreads.@threads
julia> using Test
julia> using BenchmarkTools
julia> using LinearAlgebra
julia> using BQCESubroutine
julia> using YaoLocations
julia> using BQCESubroutine: threaded_basic_broutine!
julia> Threads.nthreads() 128
julia> @testset "N=$N" for N in [15, 20]
@testset "i=$i" for i in 1:N
|err| = 9.591304821338616 N=15: Test Failed at REPL[8]:11 Expression: st0 ≈ st1 Evaluated: [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747 … 0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366] ≈ [0.9892420967764597, 0.26037900123707414, 0.614994982713237, 0.20759717205479533, 0.3126703177619974, 0.18078785290089638, 0.7422001386059047, 0.7726755538057188, 0.3277775066108153, 0.5181144753668747 … 0.5338442075110978, 0.8575211492346384, 0.9954840790239925, 0.6424407507078336, 0.7940770595462205, 0.053890792175115054, 0.9595014083141846, 0.8423338613101816, 0.5532812445454995, 0.42973496521957366] Stacktrace: [1] macro expansion @ ./REPL[8]:11 [inlined] [2] top-level scope @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1226 [inlined] [3] top-level scope @ ./REPL[8]:0 [4] eval @ ./boot.jl:360 [inlined] [5] eval_user_input(ast::Any, backend::REPL.REPLBackend) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139 [6] repl_backend_loop(backend::REPL.REPLBackend) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200 [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185 [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317 [9] run_repl(repl::REPL.AbstractREPL, consumer::Any) @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305 [10] (::Base.var"#874#876"{Bool, Bool, Bool})(REPL::Module) @ Base ./client.jl:387 [11] #invokelatest#2 @ ./essentials.jl:708 [inlined] [12] invokelatest @ ./essentials.jl:706 [inlined] [13] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool) @ Base ./client.jl:372 [14] exec_options(opts::Base.JLOptions) @ Base ./client.jl:302 [15] _start() @ Base ./client.jl:485 Test Summary: | Fail Total N=15 | 1 1 Test Summary: | Fail Total N=15 | 1 1 ERROR: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.
caused by: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.
julia>
... signal (11): Segmentation fault in expression starting at REPL[8]:1 unsafe_load at ./pointer.jl:105 [inlined] unsafe_load at ./pointer.jl:105 [inlined] ...