Closed EthanAnderes closed 5 years ago
Please try #31709 and see if that fixes it.
Probably the same as I reported here: https://github.com/JuliaLang/julia/pull/30899 which I believed was the PR that caused this in the first place. Great to see that a fix is around the corner.
Ok, so jb/fix31702
fixed that snippit for me, but my original code had the same seg fault.
Took me a while but I found I can reliably trigger it again by adding using PyPlot
in the snippit below.
Threads.nthreads() # <--- 4
using PyCall
using PyPlot # <--- need to add this to get seg fault on jb/fix31702/6dbd5ec44f
abstract type Flat end
struct X{P<:Flat, T<:Real}
t::Array{T,1}
end
function foo(x::X{P,T}) where {P<:Flat, T<:Real}
v = zeros(T, size(x.t))
Threads.@threads for i=1:length(x.t)
v[i] = x.t[i]^2 + sin(x.t[i]) - cos(x.t[i])
end
return v
end
T = Float64
x = X{Flat,T}(rand(T,1000))
@time foo(x) # <--- segmentation fault
Just to record what I'm still seeing I'll post it here. If no one else can reporduce it I guess something is weird on my end. Anyway, thanks for looking into it so quickly.
julia> versioninfo()
Julia Version 1.3.0-DEV.22
Commit c4841ca6e0* (2019-04-13 18:26 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.5.0)
CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
JULIA_FFTW_PROVIDER = MKL
julia> Threads.nthreads() # <--- 4
4
julia> using PyCall
[ Info: Recompiling stale cache file /Users/ethananderes/.julia/compiled/v1.3/PyCall/GkzkC.ji for PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]
julia> using PyPlot # <--- need to add this to get seg fault on jb/fix31702/6dbd5ec44f
[ Info: Recompiling stale cache file /Users/ethananderes/.julia/compiled/v1.3/PyPlot/oatAj.ji for PyPlot [d330b81b-6aea-500a-939a-2ce795aea3ee]
julia> abstract type Flat end
julia> struct X{P<:Flat, T<:Real}
t::Array{T,1}
end
julia> function foo(x::X{P,T}) where {P<:Flat, T<:Real}
v = zeros(T, size(x.t))
Threads.@threads for i=1:length(x.t)
v[i] = x.t[i]^2 + sin(x.t[i]) - cos(x.t[i])
end
return v
end
foo (generic function with 1 method)
julia> T = Float64
Float64
julia> x = X{Flat,T}(rand(T,1000))
X{Flat,Float64}([0.7245885961981229, 0.3383104857554391, 0.676555825916445, 0.558426289965362, 0.5530476330156351, 0.37882649143300196, 0.897409874921808, 0.8964334927209676, 0.29800859872573726, 0.5404695098549857 … 0.8914687582775509, 0.6738835181174165, 0.903603579069667, 0.44743595337482334, 0.2609432922325392, 0.26049829407694514, 0.8903264779207407, 0.8244344549248652, 0.2042527657723534, 0.7499361443047876])
julia> @time foo(x) # <--- segmentation fault
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
uv_timercb at ./asyncevent.jl:148
jlcapi_uv_timercb_4215 at /Users/ethananderes/Software/juliaMaster/usr/lib/julia/sys.dylib (unknown line)
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:162
uv_run at /workspace/srcdir/libuv/src/unix/core.c:352
jl_task_get_next at /Users/ethananderes/Software/juliaMaster/src/partr.c:303
poptaskref at ./task.jl:564
0.249855wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
secondsstart_task at /Users/ethananderes/Software/juliaMaster/src/task.c:593
(529.47 k allocations: 27.174 MiB, 6.03% gc time)
Segmentation fault: 11
Ethan, is this with Jeff's fix?
Yea, I'm able to get the seg fault even with Jeff's fix, but needed to add using PyPlot
as above to see it semi-regularly.
Here is a reduced test case that still crashes often.
function foo(x::Vector{T}) where T
v = fill(T(0), length(x))
Threads.@threads for i=1:length(x)
v[i] = exp(x[i])
end
v
end
x = rand(1000)
import PyPlot
foo(x)
It is quite finicky and seems to depend on loading PyPlot
before first execution of foo
. Was worried it had something to do with my system Anaconda distribution of python but I can trigger the crash using the default miniconda install of PyPlot.jl. However with miniconda I get a little more printout of the error
julia> foo(x)
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
fatal: error thrown and no exception handler available.
uv_timercb at ./asyncevent.jl:148
ErrorException("concurrency violation detected")
jlcapi_uv_timercb_4210 at /Users/ethananderes/Software/juliaMaster/usr/lib/julia/sys.dylib (unknown line)
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:162
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
uv_run at /workspace/srcdir/libuv/src/unix/core.c:352
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
jl_task_get_next at /Users/ethananderes/Software/juliaMaster/src/partr.c:303
error at ./error.jl:33
poptaskref at ./task.jl:564
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
wait at ./task.jl:591
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
uv_timercb at ./asyncevent.jl:148
task_done_hook at ./task.jl:327
jlcapi_uv_timercb_4210 at /Users/ethananderes/Software/juliaMaster/usr/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:162
jl_threadfun at /Users/ethananderes/Software/juliaMaster/src/partr.c:217
uv_run at /workspace/srcdir/libuv/src/unix/core.c:375
jl_task_get_next at /Users/ethananderes/Software/juliaMaster/src/partr.c:303
_pthread_body at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_start at /usr/lib/s_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
start_task at /Users/ethananderes/Software/juliaMaster/src/task.c:593
signal (11): Segmentation fault: 11
in expression starting at REPL[4]:1
_PyMethodDef_RawFastCallKeywords at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
_PyMethodDescr_FastCallKeywords at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
call_function at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
_PyEval_EvalFrameDefault at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
function_code_fastcall at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
call_function at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
_PyEval_EvalFrameDefault at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
function_code_fastcall at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
call_function at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
_PyEval_EvalFrameDefault at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
function_code_fastcall at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
atexit_callfuncs at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
Py_FinalizeEx at /Users/ethananderes/.julia/conda/3/lib/libpython3.7m.dylib (unknown line)
Py_Finalize at /Users/ethananderes/.julia/packages/PyCall/ttONZ/src/pyinit.jl:125
_atexit at ./initdefs.jl:309
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_atexit_hook at /Users/ethananderes/Software/juliaMaster/src/init.c:243
jl_exit at /Users/ethananderes/Software/juliaMaster/src/jl_uv.c:613
jl_no_exc_handler at /Users/ethananderes/Software/juliaMaster/src/task.c:369
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:170
jl_threadfun at /Users/ethananderes/Software/juliaMaster/src/partr.c:217
_pthread_body at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
Allocations: 17932081 (Pool: 17928490; Big: 3591); GC: 40
Segmentation fault: 11
Not sure if that helps. Here is my versioninfo
julia> versioninfo()
Julia Version 1.3.0-DEV.23
Commit 1c88c0e2be* (2019-04-13 20:05 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.5.0)
CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
JULIA_FFTW_PROVIDER = FFTW
(v1.3) pkg> st PyPlot
Status `~/.julia/environments/v1.3/Project.toml`
[438e738f] PyCall v1.91.2
[d330b81b] PyPlot v2.8.1
I'm trying to do a bit of work to help pin down which commit triggers the error. Doing some pattern matching from the error and commit messages (also tracing back to about the time I started seeing the seg faults). Here is what I've got so far...hope this helps
git checkout 1be1deb10c90b612a57d3541d3bf1aae1a42e104
git checkout a86aab3777d07b00404659f67f35ff2b3615ed29
Sysimage built. Summary:
Total ─────── 84.667477 seconds
Base: ─────── 28.926627 seconds 34.165%
Stdlibs: ──── 55.738737 seconds 65.8325%
JULIA usr/lib/julia/sys-o.a
Internal error: encountered unexpected error in runtime:
UndefVarError(var=:li)
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
jl_undefined_var_error at /Users/ethananderes/Software/juliaMaster/src/rtutils.c:130
jl_get_binding_or_error at /Users/ethananderes/Software/juliaMaster/src/module.c:290
_uncompressed_ast at ./reflection.jl:906
typeinf_ext at ./compiler/typeinfer.jl:560
typeinf_ext at ./compiler/typeinfer.jl:599
unknown function (ip: 0x10d7c0416)
jl_apply_generic at /Users/ethananderes/Software/juliaMaster/src/gf.c:2191 [inlined]
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_type_infer at /Users/ethananderes/Software/juliaMaster/src/gf.c:207
jl_compile_method_internal at /Users/ethananderes/Software/juliaMaster/src/gf.c:1773
jl_apply_generic at /Users/ethananderes/Software/juliaMaster/src/gf.c:2196
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
jl_threadfun at /Users/ethananderes/Software/juliaMaster/src/partr.c:217
_pthread_body at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
Generating precompile statements... 913 generated in 240.093567 seconds (overhead 212.618210 seconds)
LINK usr/lib/julia/sys.dylib
git checkout f31b6e5fec63ab17d99d9cf2b27cd3020eae904b
error
(#27900)git checkout b3b6d030860a55ffd329f24aeafac01988e560fb
git checkout 0136fa10d5ba6782465ea68de37acefcb548bd9c
julia> versioninfo()
Julia Version 1.2.0-DEV.646
Commit 0136fa10d5* (2019-04-04 03:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.5.0)
CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
JULIA_FFTW_PROVIDER = MKL
julia> function foo(x::Vector{T}) where T
v = fill(T(0), length(x))
Threads.@threads for i=1:length(x)
v[i] = exp(x[i])
end
v
end;
julia> x = rand(1000);
julia> import PyPlot
julia> foo(x);
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
uv_timercb at ./asyncevent.jl:148
jlcapi_uv_timercb_4198 at /Users/ethananderes/Software/juliaMaster/usr/lib/julia/sys.dylib (unknown line)
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:162
uv_run at /workspace/srcdir/libuv/src/unix/core.c:352
jl_task_get_next at /Users/ethananderes/Software/juliaMaster/src/partr.c:303
poptaskref at ./task.jl:564
wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
jl_threadfun at /Users/ethananderes/Software/juliaMaster/src/partr.c:217
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
_pthread_body at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
uv_timercb at ./asyncevent.jl:148
jlcapi_uv_timercb_4198 at /Users/ethananderes/Software/juliaMaster/usr/lib/julia/sys.dylib (unknown line)
uv__run_timers at /workspace/srcdir/libuv/src/timer.c:162
uv_run at /workspace/srcdir/libuv/src/unix/core.c:375
jl_task_get_next at /Users/ethananderes/Software/juliaMaster/src/partr.c:303
poptaskref at ./task.jl:564
wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_finish_task at /Users/ethananderes/Software/juliaMaster/src/task.c:167
start_task at /Users/ethananderes/Software/juliaMaster/src/task.c:593
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /Users/ethananderes/Software/juliaMaster/src/stackwalk.c:94
record_backtrace at /Users/ethananderes/Software/juliaMaster/src/task.c:210
jl_throw at /Users/ethananderes/Software/juliaMaster/src/task.c:417
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
notify_error at ./condition.jl:129 [inlined]
_uv_hook_close at ./asyncevent.jl:132
jl_apply at /Users/ethananderes/Software/juliaMaster/src/./julia.h:1604 [inlined]
jl_uv_call_close_callback at /Users/ethananderes/Software/juliaMaster/src/jl_uv.c:81 [inlined]
jl_uv_closeHandle at /Users/ethananderes/Software/juliaMaster/src/jl_uv.c:100
Segmentation fault: 11
I believe I have had this same issue, I hope this can help. At first I thought could be related to SDL or using BenchmarkTools, or running Julia from within emacs, but I have been able to reproduce it from the Julia command line and without using any libraries, just by running the following code:
using Base.Threads
function fun()
for f in 1:1024
@threads for j in 1:2
end
end
nothing
end
fun()
You just need to include
it a few times before it happens. So I guess it depends on some sort of race condition, or whatever, that just turns out to happen more frequently if you load any libraries, etc.
julia> while true
@show include("thtest.jl")
end
include("thtest.jl") = nothing
include("thtest.jl") = nothing
include("thtest.jl") = nothing
include("thtest.jl") = nothing
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /home/user/src/julia/src/stackwalk.c:94
record_backtrace at /home/user/src/julia/src/task.c:210 [inlined]
jl_throw at /home/user/src/julia/src/task.c:417
error at ./error.jl:33
assert_havelock at ./condition.jl:20 [inlined]
assert_havelock at ./condition.jl:43 [inlined]
assert_havelock at ./condition.jl:67 [inlined]
notify at ./condition.jl:118
#notify#463 at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
notify at ./condition.jl:116 [inlined]
readcb_specialized at ./stream.jl:558
jfptr_readcb_specialized_4895 at /home/user/src/julia/usr/lib/julia/sys.so (unknown line)
jl_apply_generic at /home/user/src/julia/src/gf.c:2191
uv_readcb at ./stream.jl:599
jlcapi_uv_readcb_4104 at /home/user/src/julia/usr/lib/julia/sys.so (unknown line)
uv__read at /workspace/srcdir/libuv/src/unix/stream.c:1179
uv__stream_io at /workspace/srcdir/libuv/src/unix/stream.c:1339
uv__io_poll at /workspace/srcdir/libuv/src/unix/linux-core.c:361
uv_run at /workspace/srcdir/libuv/src/unix/core.c:361
jl_task_get_next at /home/user/src/julia/src/partr.c:303
poptaskref at ./task.jl:564
wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply_generic at /home/user/src/julia/src/gf.c:2191
jl_apply at /home/user/src/julia/src/julia.h:1604 [inlined]
jl_finish_task at /home/user/src/julia/src/task.c:167
start_task at /home/user/src/julia/src/task.c:593
unknown function (ip: 0xffffffffffffffff)
And this is with the jl_wake_libuv();
fix (or at least I think it is). Also, the outer loop seems to be necessary, or at least helps triggering it.
EDIT3: everything works fine for me with 1be1deb10c90b612a57d3541d3bf1aae1a42e104
I just ran into this segfault
julia> @time chains, tuned_sampler = NUTS_init_tune_mcmc(ada, 1000);
MCMC, adapting ϵ (75 steps)
5.5 s/step ...done
MCMC, adapting ϵ (25 steps)
5.7 s/step ...done
MCMC, adapting ϵ (50 steps)
52.0 s/step ...done
MCMC, adapting ϵ (100 steps)
step 100 (of 100), 52.0 s/step
52.0 s/step ...done
MCMC, adapting ϵ (200 steps)
step 100 (of 200), 52.0 s/step
step 200 (of 200), 52.0 s/step
52.0 s/step ...done
MCMC, adapting ϵ (400 steps)
step 100 (of 400), 52.0 s/step
step 200 (of 400), 52.0 s/step
step 300 (of 400), 52.0 s/step
step 400 (of 400), 52.0 s/step
52.0 s/step ...done
MCMC, adapting ϵ (50 steps)
53.0 s/step ...done
MCMC (1000 steps)
step 100 (of 1000), 53.0 s/step
fatal: error thrown and no exception handler available.
ErrorException("concurrency violation detected")
rec_backtrace at /home/chriselrod/Documents/languages/julia/src/stackwalk.c:94
record_backtrace at /home/chriselrod/Documents/languages/julia/src/task.c:210 [inlined]
jl_throw at /home/chriselrod/Documents/languages/julia/src/task.c:417
error at ./error.jl:33
concurrency_violation at ./condition.jl:8
assert_havelock at ./condition.jl:26 [inlined]
assert_havelock at ./condition.jl:49 [inlined]
assert_havelock at ./condition.jl:73 [inlined]
notify at ./condition.jl:124
#notify#468 at ./condition.jl:122 [inlined]
notify at ./condition.jl:122 [inlined]
uv_fseventscb_file at /home/chriselrod/Documents/languages/julia/usr/share/julia/stdlib/v1.3/FileWatching/src/FileWatching.jl:318
unknown function (ip: 0x7f9da2588e5a)
jlcapi_uv_fseventscb_file_24324_gfthunk at /home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so (unknown line)
jlcapi_uv_fseventscb_file_24324 at /home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so (unknown line)
uv__inotify_read at /home/chriselrod/Documents/languages/julia/deps/srccache/libuv-2348256acf5759a544e5ca7935f638d2bc091d60/src/unix/linux-inotify.c:193
uv__io_poll at /home/chriselrod/Documents/languages/julia/deps/srccache/libuv-2348256acf5759a544e5ca7935f638d2bc091d60/src/unix/linux-core.c:361
uv_run at /home/chriselrod/Documents/languages/julia/deps/srccache/libuv-2348256acf5759a544e5ca7935f638d2bc091d60/src/unix/core.c:361
jl_task_get_next at /home/chriselrod/Documents/languages/julia/src/partr.c:306
poptaskref at ./task.jl:564
wait at ./task.jl:591
task_done_hook at ./task.jl:327
jl_apply at /home/chriselrod/Documents/languages/julia/src/julia.h:1610 [inlined]
jl_finish_task at /home/chriselrod/Documents/languages/julia/src/task.c:167
start_task at /home/chriselrod/Documents/languages/julia/src/task.c:593
unknown function (ip: 0xffffffffffffffff)
This ran for about 13 hours before segfaulting (52 seconds / step * 900 steps / (60^2 seconds / hour).
The code has an @threads
loop, along with a global const logdensity_lock = Threads.SpinLock()
used to lock and unlock the threads. The specific part of the function:
nthreads = Threads.nthreads()
Threads.@threads for thread ∈ 1:nthreads
thread_target = 0.0
thread_∂invσ = 0.0
thread_∂β = 0.0
sat_start = 1 + (thread - 1)*S ÷ nthreads
sat_end = thread*S ÷ nthreads
gradp = @MVector zeros(4)
gradm = @MVector zeros(7)
obsstart = thread == 1 ? 1 : SatInds[sat_start - 1] + 1
@inbounds for sat ∈ sat_start:sat_end
obsend = SatInds[sat]
sat_μ = b_sat[sat]
for obsind ∈ obsstart:obsend
p, m = PropMethods[obsind]
l2n = log2Ns[obsind]
ad = SatADs[obsind]
μ = p < Pp1 ? b_prop[p] + sat_μ : sat_μ
μ = m < Mp1 ? b_method[m] + μ : μ
t, ∂t∂μ, ∂t∂invσ = ∂normal_lcdf(ad, μ + l2n * β, invσ)
thread_target += t
thread_∂invσ += ∂t∂invσ
thread_∂β += ∂t∂μ * l2n
grad[sat] += ∂t∂μ
if p < Pp1
gradp[p] += ∂t∂μ
end
if m < Mp1
gradm[m] += ∂t∂μ
end
end
obsstart = obsend + 1
end
Threads.lock(logdensity_lock)
common_target[] += thread_target
common_∂invσ[] += thread_∂invσ
common_∂β[] += thread_∂β
for m ∈ 1:M
grad[S+m] = gradm[m]
end
for p ∈ 1:P
grad[S+M+p] = gradp[p]
end
Threads.unlock(logdensity_lock)
end
This example is far from minimal. Given it took 13 hours to get here, it may take me sometime to get something smaller to reproduce the problem.
Commenting to say that the issue still seems to be here. That was with:
julia> versioninfo()
Julia Version 1.3.0-DEV.185
Commit c9777b0ab4* (2019-05-08 14:34 UTC)
Platform Info:
OS: Linux (x86_64-generic-linux)
CPU: Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.0 (ORCJIT, skylake)
That's what I get for running slow code on master for no good reason, ha ha.
The error is from a FileWatching event. Are you using Revise?
Yes. Next time I start a long running threaded program, I'll start julia with --startup=no
(using Revise
is in my startup.jl
).
Totally confused by the segmentation fault I'm getting on dev branch. Started seeing it perhaps a couple weeks back but had a hard time getting a small test case that would reliably trigger the error.
The following snippet works fine if threads are off and works v1.1 with threads on or off. For some reason importing PyCall seems important but I have no idea why (BTW: I have PyCall v1.91.2).
Can anybody else reproduce this?
Here is the error I get