IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

Intermittent segfault in opt-flow #16

Closed lkuper closed 8 years ago

lkuper commented 8 years ago

I'm intermittently seeing this error:

julia> include("examples/opt-flow/opt-flow.jl")
nframes = 2
filenames = UTF8String["small_001.dat","small_002.dat"]
checksums = Float32[80751.555f0,80818.22f0]
Image size: 584x388
SELFPRIMED 26.040181867

signal (11): Segmentation fault
writeFlo at /home/lkuper/.julia/v0.4/ParallelAccelerator/examples/opt-flow/image.jl:132
main at /home/lkuper/.julia/v0.4/ParallelAccelerator/examples/opt-flow/opt-flow.jl:223
jlcall_main_21594 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 0x7fd479744293)
unknown function (ip: 0x7fd4797436d1)
unknown function (ip: 0x7fd479758758)
unknown function (ip: 0x7fd479759449)
jl_load at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
include at ./boot.jl:261
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
include_from_node1 at ./loading.jl:304
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 0x7fd479744293)
unknown function (ip: 0x7fd4797436d1)
unknown function (ip: 0x7fd479758758)
jl_toplevel_eval_in at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
eval_user_input at REPL.jl:62
jlcall_eval_user_input_21252 at  (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
anonymous at REPL.jl:92
unknown function (ip: 0x7fd47974ae18)
unknown function (ip: (nil))
Segmentation fault (core dumped)

Line 132 of opt-flow/image.jl is a call to Base.write(). This is against Julia 0.4.0 release on Ubuntu. I see memory usage increase while I'm running but I don't run out of memory before it crashes.

lkuper commented 8 years ago

It also seems to be happening on Travis: https://travis-ci.org/IntelLabs/ParallelAccelerator.jl/jobs/89117763#L221

This couldn't have anything to do with updating to version 0.5.0 of Images.jl, could it? opt-flow/image.jl is just a small library of image manipulation functions with no dependencies.

timholy commented 8 years ago

I don't see what that code has to do with Images.jl---you're calling a base I/O method on types defined in base (two Matrix{Float32} and a string). So Images.jl seems unlikely to be the culprit. But do let me know if you discover evidence to the contrary.

The "big change" in Images 0.5 is just a code reorganization, moving I/O into separate repositories. Images.jl itself no long makes any ccalls---it's now a pure-julia package. So any segfaults are either a bug in julia itself, a bug in a package used by Images, or a bug here.

lkuper commented 8 years ago

Nothing new to report, just noting that this is still happening and it worries me. I'll try to investigate later. https://travis-ci.org/IntelLabs/ParallelAccelerator.jl/jobs/89505881#L287

ninegua commented 8 years ago

It is confirmed not to be the change in Images library, but the recent commit by @DrTodd13 that changed j2c-array C++ class. We'll have a solution soon.

lkuper commented 8 years ago

@ninegua Thanks for investigating!

timholy commented 8 years ago

Nice job finding the source.

lkuper commented 8 years ago

It looks as though the segfaults have been fixed now (although we're still getting failures on Travis due to an unrelated issue). I think we can close this. Thanks, @ninegua and @DrTodd13.