Closed nalimilan closed 7 years ago
Also seen in openSUSE Leap 42.2.
Does this happen in a default source build, or only when you're building against some system dependencies? Scroll up to where the ambiguity test ran and it should be printing the list of ambiguous methods.
Not sure what you mean by "list of ambiguous methods." In:
Julia Version 0.6.0-dev.1698
Commit 4793e88* (2016-12-25 07:38 UTC)
Platform Info:
OS: Linux (x86_64-suse-linux)
CPU: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)
terminal output that mentions "ambiguous" (selectively) is:
threads (2) | 38.54 | 1.34 | 3.5 | 83.29 | 2158.79
error (2) | 0.64 | 0.00 | 0.0 | 2.63 | 2158.79
WARNING: Method definition ambig(Any, Integer) in module TestMain_ambiguous at /home/colin/Downloads/julia6/test/ambiguous.jl:7 overwritten at /home/colin/Downloads/julia6/test/ambiguous.jl:84.
boundscheck (5) | 2.45 | 0.00 | 0.0 | 4.55 | 1654.04
cartesian (5) | 0.04 | 0.00 | 0.0 | 0.10 | 1656.18
From worker 2: Skipping Base.<|
From worker 2: Skipping Base.>:
From worker 2: Skipping Base.active_repl
From worker 2: Skipping Base.active_repl_backend
asmvariant (5) | 0.05 | 0.00 | 0.0 | 1.33 | 1658.32
osutils (5) | 0.02 | 0.00 | 0.0 | 0.24 | 1660.92
ambiguous (2) | 1.01 | 0.05 | 5.1 | 33.64 | 2158.79
channels (5) | 8.77 | 0.75 | 8.6 | 446.92 | 1877.34
dsp (5) | 6.67 | 0.00 | 0.0 | 319.38 | 1877.34
fft (2) | 16.24 | 0.79 | 4.9 | 767.73 | 2158.79
examples (5) | 12.14 | 0.00 | 0.0 | 679.32 | 2485.93
parallel (3) | 62.43 | 0.00 | 0.0 | 1.43 | 1100.48
cmdlineargs (4) | 102.76 | 0.05 | 0.1 | 50.24 | 1121.77
From worker 1: Worker 3 failed running test misc:
Some tests did not pass: 1279452 passed, 0 failed, 2 errored, 0 broken.
in finish at ./test.jl:524
in macro expansion; at ./util.jl:288 [inlined]
in macro expansion; at /home/colin/Downloads/julia6/test/testdefs.jl:0 [inlined]
in anonymous at ./<missing>:?
in eval at ./boot.jl:236
in runtests at /home/colin/Downloads/julia6/test/testdefs.jl:16
in runtests at /home/colin/Downloads/julia6/test/testdefs.jl:4
in #586 at ./multi.jl:1455
in run_work_thunk at ./multi.jl:1004
in macro expansion at ./multi.jl:1455 [inlined]
in #585 at ./event.jl:66misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: $(Expr(:escape, :(summarysize(Core)))) $(Expr(:escape, :>)) $(Expr(:escape, :(summarysize(Core.Inference)))) $(Expr(:escape, :>)) $(Expr(:escape, :(Core.sizeof(Core))))
StackOverflowError:
misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: summarysize(Base) > 10000 * sizeof(Int)
StackOverflowError:
Test Summary: | Pass Error Broken Total
Overall | 20030733 2 29 20030764
followed by:
osutils | 21 21
ambiguous | 42 42
channels | 32 32
Let me know where I should be looking.
Looks like you only get the stack overflow then, not the Expression evaluated to non-Boolean
from the ambiguity test. Does the stack overflow happen if you just run make test-misc
on its own?
I have seen the non-Boolean error as well.
Both make test-misc
and make test-ambiguous
run successfully without error when run by themselves.
Ah, the ambiguity is reproducible when running math
and ambiguous
on the same worker, it was caused by https://github.com/JuliaLang/julia/pull/19627 and probably just needs a *(::MockQuantity, ::Bool)
method to be defined. I can't reproduce the misc failure on ubuntu with make testall1
so we'll need to keep debugging that. What does Base.summarysize(Base)
return for you from a normal REPL? If you add @show summarysize(Base)
around line 189 of test/misc.jl
, what value does it output for test-misc
by itself vs a failing run?
julia> Base.summarysize(Base)
41035484
I inserted the @show statement after line with import Base.summarysize
> make test-misc
JULIA test/misc
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
summarysize(Base) = 41465530
misc (1) | 8.42 | 0.86 | 10.3 | 1177.03 | 363.44
Test Summary: | Pass Total
Overall | 1279454 1279454
SUCCESS
Strangely, I can't see the string "summarysize" anywhere in the output from make testall
which consistently fails with the same output.
probably because the show statement is triggering the stack overflow too, but only if some subset of tests have run previously on the same worker. running JULIA_CPU_CORES=1 ./julia test/runtests.jl foo bar etc
with the list of tests that ran previously on the same worker as misc should reproduce it too? And possibly allow some reducing of the test case to a smaller subset?
OK here is my progress: thinking that the error might have been triggered by something recent in the misc worker I ran
> JULIA_CPU_CORES=1 julia6 test/runtests.jl cmdlineargs enums markdown nullable grisu llvmcall repl show broadcast goto reflection mpfr pollfd resolve version read spawn math copy blas vecelement simdloop intfuncs random offsetarray staged iobuffer hashing dict triplequote char printf sparse/sparse
which, disappointingly, run successfully. I may be shooting myself in the foot by testing in roughly the reverse order of testall, I don't know. I'll keep expanding the list.
it should include misc at the end to reproduce the stack overflow, and may be order dependent
Right, thanks for noticing the missing misc - rather important. Adding the full complement of tests which ran on the same worker and misc at the end triggers the failure. Order does not seem to be important since my tests ran in reverse order. I will now go back through previous tests and report back.
Ok this is getting complicated.
> JULIA_CPU_CORES=1 julia6 test/runtests.jl math copy blas vecelement simdloop intfuncs random offsetarray staged iobuffer hashing dict triplequote char printf sparse/sparse linalg/matmul linalg/svd linalg/tridiag linalg/pinv linalg/givens linalg/arnoldi linalg/cholesky linalg/hessenberg misc
produces the error but
> JULIA_CPU_CORES=1 julia6 test/runtests.jl cmdlineargs enums markdown nullable grisu llvmcall repl show broadcast goto reflection mpfr pollfd resolve version read spawn math misc
does not.
So clearly I have to split the first (which failed), and I did this on the sparse test. Then ran both sides and both ran to success.
Please correct my thinking, which is that maybe there are two or more tests, at least one from each of both sides of the sparse test, which in combination with misc triggers the failure.
What I usually do with this kind of thing is (write a script to) remove one entry from the list at a time, if reproducing the failure happens without it, leave it off and try another. If there's no failure without one of the entries, mark it as required and don't try removing it again. Stop when removing any one more entry doesn't reproduce the failure any more.
Just in case it is meaningful, while running these tests there is a quantity of output which does not occur when running make testall
, here is a sample (this appears to be repeated over and over and fills my terminal buffer scrolling other information out of view):
in summarysize(::Any, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.jl:741
in summarysize(::Array{Any,1}, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.jl:768
in _summarysize(::Any, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.jl:752
in summarysize(::Any, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.
There's something odd about triplequote.
> JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/pinv linalg/matmul offsetarray linalg/tridiag linalg/givens linalg/arnoldi triplequote math linalg/cholesky sparse/sparse linalg/svd misc
fails, but if I remove triplequote then it succeeds. Mean anything?
After an update to Version 0.6.0-dev.1701 (2016-12-26 06:25 UTC), make testall
still fails but no longer mentions errors related to stack overflow.
parallel (5) | 62.45 | 0.00 | 0.0 | 5.32 | 1579.21
cmdlineargs (2) | 102.13 | 0.10 | 0.1 | 53.52 | 3090.21
From worker 1: Worker 3 failed running test ambiguous:
Some tests did not pass: 41 passed, 0 failed, 1 errored, 0 broken.
in finish at ./test.jl:524
in macro expansion; at ./util.jl:288 [inlined]
in macro expansion; at /home/colin/Downloads/julia6/test/testdefs.jl:0 [inlined]
in anonymous at ./<missing>:?
in eval at ./boot.jl:236
in runtests at /home/colin/Downloads/julia6/test/testdefs.jl:16
in runtests at /home/colin/Downloads/julia6/test/testdefs.jl:4
in #586 at ./multi.jl:1455
in run_work_thunk at ./multi.jl:1004
in macro expansion at ./multi.jl:1455 [inlined]
in #585 at ./event.jl:66ambiguous: Error During Test
Expression evaluated to non-Boolean
Expression: ((x->begin
isempty(x) || println(x)
end))(detect_ambiguities(Core,Base; imported=true))
Value: nothing
Test Summary: | Pass Error Broken Total
Overall | 20027342 1 29 20027372
...
cmdlineargs | 121 121
compile | 61 61
FAILURE
Error in testset ambiguous:
Error During Test
Expression evaluated to non-Boolean
Expression: ((x->begin
isempty(x) || println(x)
end))(detect_ambiguities(Core,Base; imported=true))
Value: nothing
ERROR: LoadError:
in (::##33#41)() at /home/colin/Downloads/julia6/test/runtests.jl:195
...
FYI here is the new set of tests that produce the failure:
> JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/qr linalg/schur linalg/special linalg/bunchkaufman linalg/bidiag linalg/lu linalg/lq sparse/sparsevector dates/rounding dates/types dates/io dates/arithmetic core arrayops copy math spawn read resolve broadcast lineedit replcompletions replutil sets goto grisu meta stacktraces profile docs markdown base64 serialize threads libdl int intset floatfuncs inline boundscheck error cartesian asmvariant osutils channels fft ambiguous
FWIW for me the above test set reduces to
JULIA_CPU_CORES=1 julia6 test/runtests.jl math ambiguous
which fails.
Remove math
and it runs successfully, make test-math
is also successful.
Ah, the ambiguity is reproducible when running
math
andambiguous
on the same worker, it was caused by #19627 and probably just needs a*(::MockQuantity, ::Bool)
method to be defined.
That one should be simple to fix. The stack overflow is the more interesting part of this report. make testall
won't be exactly deterministic in terms of which tests run on which worker, since it's timing-dependent. If you run JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/pinv linalg/matmul offsetarray linalg/tridiag linalg/givens linalg/arnoldi triplequote math linalg/cholesky sparse/sparse linalg/svd misc
again on latest master does the stack overflow still happen? Can anything else on that list be removed and still trigger the stack overflow? I'm guessing maybe something is interfering with inference of an element type somewhere and resulting in an Array{Any,1}
that's actually supposed to have a more specific element type, but I don't know what would cause that.
The command:
JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/pinv linalg/matmul offsetarray linalg/tridiag linalg/givens linalg/arnoldi triplequote math linalg/cholesky sparse/sparse linalg/svd misc
now runs successfully on v. 1701.
What version did it last fail at? Did it happen reliably every time, or is it maybe non-deterministic?
Last seen in v 1698 as reported above. Failed every time that I could see, but of course the sample size is very small.
So that was 4793e88, and there's only been one commit to master since then, changing gmp and curl versions.
FWIW, I still get the stack overflow with latest master. Thanks for investigating this @colbec!
@nalimilan Yes the latest master still fails with overflow when using the full test set.
Here is my latest single core set which causes fail:
> JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/matmul linalg/bunchkaufman linalg/bidiag linalg/symmetric linalg/lq sparse/sparsevector dates/ranges dates/arithmetic inference worlds keywordargs printf char triplequote dict hashing iobuffer staged offsetarray tuple random intfuncs simdloop blas copy math loading bigint spawn read mmap broadcast mod2pi show sets llvmcall grisu meta stacktraces profile docs markdown threads inline error cartesian asmvariant osutils channels dsp examples misc
The voluminous output from interactiveutil.jl is inconvenient in testing since it scrolls off useful information regarding testing times. I am working on the idea that if the testing reports that x test took the most time then I target that one for next removal, which in theory should speed up the next test.
Here's another observation: if I select the linalg tests from the above set and run them followed by misc then it succeeds (this is consistently repeatable). The set of not(linalg/*) entries followed by misc also succeeds (this is consistently repeatable). Now taking the not(linalg/*) set and adding back linalg/matmul causes fail (repeatable).
Edit: but adding back linalg/bunchkaufman but not matmul sometimes succeeds, sometimes fails, about 50-50 from a small sample. Edit 2: when adding back a linalg test which subsequently fails, the testing process does not end with "FAILURE" message, but continues to flood output and stops at
ERROR: LoadError:
in (::##33#41)() at /home/colin/Downloads/julia6/test/runtests.jl:195
in cd(::##33#41, ::String) at ./file.jl:69
in include_from_node1(::String) at ./loading.jl:532
in include(::String) at ./sysimg.jl:14
in process_options(::Base.JLOptions) at ./client.jl:293
in _start() at ./client.jl:357
while loading /home/colin/Downloads/julia6/test/runtests.jl, in expression starting on line 38
after the usual flood of interactiveutil.jl stuff.
Update: I am now on version 1781 after significant update overnight and make testall still fails. Excerpts:
From worker 1: Worker 1 failed running test broadcast:
UndefVarError: TestMain_broadcast not defined
in deserialize(::Base.ClusterSerializer{TCPSocket}, ::Type{Module}) at ./serialize.jl:601
in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:580
... and repeats ... ...
then
in #555 at ./event.jl:66misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: $(Expr(:escape, :(summarysize(Core)))) $(Expr(:escape, :>)) $(Expr(:escape, :(summarysize(Core.Inference)))) $(Expr(:escape, :>)) $(Expr(:escape, :(Core.sizeof(Core))))
StackOverflowError:
misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: summarysize(Base) > 10000 * sizeof(Int)
the broadcast issue is unrelated and will be fixed by https://github.com/JuliaLang/julia/pull/19745
@nalimilan @tkelman make testall
succeeds in v 0.6 1838 on openSUSE. Close issue now?
I saw this a couple of times on win64 yesterday. It's intermittent, but I think it's still there.
Yes, it is, I just ran the same version again and it failed. Pity I did not get a copy of the output, it might have been useful for comparison. I'll try again.
I ran three more make testall
with the same version of v 0.6, there were 2 fails and the last succeeded.
FWIW here is the list of tests that ran on the same worker as misc
on the successful run:
linalg/triangular sparse/higherorderfns sparse/spqr strings/io unicode/utf8 dates/query dates/periods inference worlds numbers abstractarray operators ccall bigfloat sorting version resolve pollfd mpfr broadcast combinatorics rounding euler lineedit replcompletions repl goto llvmcall grisu nullable base64 misc inline boundscheck cartesian asmvariant osutils channels dsp
I'm open to suggestions how this info might be useful in comparison with previous observed fails.
the dispatch loop you posted in https://github.com/JuliaLang/julia/issues/19707#issuecomment-269121746 is the best lead so far. Maybe we should try to find what inputs are causing that loop?
I'm also seeing this issue on a bot running testall1
, but cannot reproduce with the sets of tests reported by @colbec in here. I'm running a bisect script now, lets hope it yields something useful over the weekend.
I too am having troubles repeating some of my own tests, including the short test that generated the loop mentioned by @tkelman . One that does fail pretty consistently including the loop is:
JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/matmul linalg/bunchkaufman linalg/bidiag linalg/symmetric linalg/lq sparse/sparsevector dates/ranges dates/arithmetic inference worlds keywordargs printf char triplequote dict hashing iobuffer staged offsetarray tuple random intfuncs simdloop blas copy math loading bigint spawn read mmap broadcast mod2pi show sets llvmcall grisu meta stacktraces profile docs markdown threads inline error cartesian asmvariant osutils channels dsp examples misc > miscfail.txt
WARNING: Method definition f265a(Any) in module TestMain_worlds at /home/colin/Downloads/julia6/test/worlds.jl:12 overwritten at /home/colin/Downloads/julia6/test/worlds.jl:17.
Warning: threaded loop executed in order
ERROR: LoadError:
Stacktrace:
[1] (::##33#41)() at /home/colin/Downloads/julia6/test/runtests.jl:195
[2] cd(::##33#41, ::String) at ./file.jl:69
[3] include_from_node1(::String) at ./loading.jl:532
[4] include(::String) at ./sysimg.jl:14
[5] process_options(::Base.JLOptions) at ./client.jl:308
[6] _start() at ./client.jl:374
while loading /home/colin/Downloads/julia6/test/runtests.jl, in expression starting on line 38
I have saved the entire output of the fail to a file (size 12 MB) and can upload this if it is of use. It is extremely repetitive. One new detail I had not seen before is
...
dsp (1) | 6.38 | 0.55 | 8.6 | 302.97 | 1006.70
examples (1) | 11.88 | 0.51 | 4.3 | 672.37 | 1006.70
misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: $(Expr(:escape, :(summarysize(Core)))) $(Expr(:escape, :>)) $(Expr(:escape, :(summarysize(
StackOverflowError:
Stacktrace:
[1] setindex!(::ObjectIdDict, ::Any, ::Any) at ./associative.jl:310
[2] _summarysize(::Any, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.jl:745
[3] summarysize(::Any, ::ObjectIdDict, ::Type{T}) at ./interactiveutil.jl:741
...
with the reference to associative.jl.
Oh yeah that set reproduces it. I'll use it as a starting point for my bisect, let's hope I can shave some more tests of.
I get this on Arch linux too, using the following Make.user:
USE_SYSTEM_LLVM:=1
USE_SYSTEM_LIBUNWIND:=1
USE_SYSTEM_PCRE:=1
USE_SYSTEM_LIBM:=1
USE_SYSTEM_OPENLIBM:=1
USE_SYSTEM_OPENSPECFUN:=1
USE_SYSTEM_DSFMT:=0
USE_SYSTEM_BLAS:=0
USE_SYSTEM_LAPACK:=0
USE_SYSTEM_FFTW:=1
USE_SYSTEM_GMP:=1
USE_SYSTEM_MPFR:=1
USE_SYSTEM_ARPACK:=1
USE_SYSTEM_SUITESPARSE:=1
USE_SYSTEM_LIBUV:=0
USE_SYSTEM_UTF8PROC:=0
USE_SYSTEM_LIBGIT2:=1
USE_SYSTEM_PATCHELF:=1
USE_INTEL_MKL:=0
USE_BLAS64:=0
USE_LLVM_SHLIB:=0
StackOverflowErrors are often annoying. If it's easier than trapping a difficult-to-catch error...I have noticed that many arise from constructs like this:
foo{T}(x::T, y::T) = # the real implementation of foo
foo(x, y) = foo(promote(x, y)...)
The problem arises because promote(x, y)
has a default of returning (x, y)
when they can't be promoted to a common type, leading that second line to call itself recursively. I've recently become a big fan of "defensive programming" like this:
foo(x, y) = _foo(promote(x, y)...) # note the underscore
_foo{T}(x::T, y::T) = foo(x, y)
_foo(x, y) = throw_promoteerror(x, y)
@noinline function throw_promoteerror(x, y)
throw(ArgumentError("$x::$(typeof(x)) and $y::$(typeof(y)) cannot be promoted to a common type"))
end
I'd argue we should do this systematically throughout Base. This may be one of those cases where it would be faster to find the bug by changing any possibly-problematic lines to the more defensive form, since the error message will tell you a lot about what's going on.
It seems to be nondeterministic even with 1 worker... The set below definitely does make it crash, but takes many attempts to reproduce (I'm testing on cbc6670dd612ca5893f1f64e5fc5ef88b5727587):
$ JULIA_CPU_CORES=1 ./julia test/runtests.jl linalg/matmul linalg/bidiag linalg/symmetric sparse/sparsevector inference worlds printf dict offsetarray random math spawn broadcast stacktraces profile markdown dsp misc
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
linalg/matmul (1) | 67.16 | 0.81 | 1.2 | 2154.20 | 293.20
linalg/bidiag (1) | 37.77 | 0.73 | 1.9 | 1197.66 | 362.35
linalg/symmetric (1) | 15.78 | 0.25 | 1.6 | 469.39 | 374.47
sparse/sparsevector (1) | 81.53 | 1.19 | 1.5 | 2017.42 | 486.43
inference (1) | 1.16 | 0.01 | 1.3 | 32.51 | 490.12
WARNING: Method definition f265a(Any) in module TestMain_worlds at /tmp/julia/test/worlds.jl:12 overwritten at /tmp/julia/test/worlds.jl:17.
worlds (1) | 0.91 | 0.02 | 1.7 | 19.14 | 495.35
printf (1) | 15.55 | 0.32 | 2.1 | 248.24 | 497.46
dict (1) | 9.33 | 0.23 | 2.5 | 343.33 | 512.73
offsetarray (1) | 39.76 | 0.96 | 2.4 | 1079.01 | 668.10
random (1) | 9.13 | 0.20 | 2.2 | 273.29 | 685.34
math (1) | 21.47 | 0.53 | 2.5 | 374.49 | 690.46
[stdio passthrough ok]
spawn (1) | 19.43 | 0.19 | 1.0 | 206.89 | 709.43
broadcast (1) | 21.40 | 1.18 | 5.5 | 683.05 | 717.90
stacktraces (1) | 1.42 | 0.04 | 2.9 | 31.47 | 721.82
profile (1) | 3.08 | 0.06 | 2.0 | 59.57 | 723.25
markdown (1) | 3.63 | 0.08 | 2.3 | 87.71 | 724.30
dsp (1) | 7.04 | 0.28 | 4.0 | 313.35 | 728.03
misc: Error During Test
Test threw an exception of type StackOverflowError
There also seem to be 2 slightly different errors:
misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: $(Expr(:escape, :(summarysize(Core)))) $(Expr(:escape, :>)) $(Expr(:escape, :(summarysize(Core.Inference)))) $(Expr(:escape, :>)) $(Expr(:escape, :(Core.sizeof(Core))))
and
misc: Error During Test
Test threw an exception of type StackOverflowError
Expression: summarysize(Base) > 10000 * sizeof(Int)
both when executing the misc
test.
I guess I'll try and catch one with rr
to see if I can gather some more details.
I played with this too. You were much more successful than I in trimming it down. (I think I was able to get rid of printf
but I had many others, perhaps it's not a specific combination.) My one other observation is that it's coupled to the summarysize
method for TypeMapEntry
, and inserting
@show typeof(m)
flush(STDOUT)
sometimes results in
typeof(m) = ERROR...
so it seems to be some kind of object corruption.
Fools rush in: to try to force an early error I tried multiple testing misc in the following way (edit3 this is a red herring, removing the references to warnonce and testonce from misc leads to success, but the overall failure persists):
JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/matmul misc linalg/bunchkaufman misc linalg/bidiag linalg/symmetric linalg/lq misc
Edit: in fact to get this I just have to say JULIA_CPU_CORES=1 julia6 test/runtests.jl misc misc
Edit2: JULIA_CPU_CORES=1 julia6 test/runtests.jl dsp dsp
is successful, as is doubling linalg/matmul
I switch off line 5 in runtests.jl (which ensures that the tests array set is unique), but on second run of misc it is very unhappy and outputs:
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
linalg/matmul (1) | 59.26 | 0.68 | 1.1 | 2193.33 | 296.49
OK to here
misc (1) | 9.81 | 0.82 | 8.4 | 1219.46 | 406.40
linalg/bunchkaufman (1) | 20.26 | 0.22 | 1.1 | 699.76 | 429.01
WARNING: replacing module TestMain_misc
OK to here
misc: Test Failed
Expression: contains(sprint((io->begin
Base.warn_once(io,"test")
end)),"WARNING: test")
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:532
[2] include(::String) at ./sysimg.jl:14
[3] macro expansion; at /home/colin/Downloads/julia6/test/testdefs.jl:13 [inlined]
...
Test Summary: | Pass Fail Total
Overall | 6403639 12 6403651
linalg/matmul | 527 527
misc | 1279505 1279505
linalg/bunchkaufman | 2219 2219
misc | 1279502 3 1279505
linalg/bidiag | 1232 1232
misc | 1279502 3 1279505
linalg/symmetric | 1328 1328
misc | 1279502 3 1279505
linalg/lq | 820 820
misc | 1279502 3 1279505
FAILURE
In Version 0.6.0-dev.1997 I believe I see a consistent pattern of success/fail with the following set:
JULIA_CPU_CORES=1 julia6 test/runtests.jl linalg/matmul linalg/bunchkaufman linalg/bidiag linalg/symmetric linalg/lq sparse/sparsevector dates/ranges dates/arithmetic inference worlds keywordargs printf char triplequote dict hashing iobuffer staged offsetarray tuple random intfuncs simdloop blas copy math loading bigint spawn misc
This set fails, but if misc runs before spawn then it succeeds.
I will try digging into spawn
to see if I can isolate a section which will allow a succeed.
In my previous comment I noted an issue with the order of spawn and misc.
Here is my testing process for examining spawn: 1. place an end
statement at the end of the file and then place an if false
statement above it in a convenient location to wrap the end of the file in an if block to ensure it will not be executed, and 2. slide the if false
up and down to try to find a sensitive spot.
As we know this is an intermittent error which maybe explains my bizarre results. At about line 80 in the spawn tests there is a section which begins @test success(truecmd)
; I can place the sliding if false before this line and the tests mostly pass. If I place the if false before the STDIN redirection
a few lines later it mostly fails with the stackoverflow error. This is about as close as I can come so far.
I tried commenting out the @test_broken lines and it still failed. However I remain suspicious that there is maybe a problem in the first 80+ lines of this file after the cmd definitions. The individual commands all seem to pass in the REPL. Of course it could be something else entirely.
I wonder if it could be related to / fixed by https://github.com/JuliaLang/julia/pull/19590
@vtjnash I've just tried, and I got the same error.
Confirmed it's fixed on latest master.
I see new failures when building Fedora/RHEL RPM nightlies. These have bee, introduced since d47f24b638e8b0dfc1ec73d1406d7a66ae4b4a50 (from Dec. 22).
https://copr-be.cloud.fedoraproject.org/results/nalimilan/julia-nightlies/fedora-25-x86_64/00492486-julia/build.log.gz