JuliaIO / CodecZlib.jl

zlib codecs for TranscodingStreams.jl.
Other
51 stars 24 forks source link

Segmentation fault when using transcode #36

Closed janfrancu closed 5 years ago

janfrancu commented 5 years ago

I have been running some data processing in which I compress the output array of bytes (FlatBuffer built bytes) using the transcode API, however sometimes some of the workers die due to this crash inside zlib.

From worker 2:
From worker 2:  signal (11): Segmentation fault
From worker 2:  in expression starting at no file:0
From worker 2:  unknown function (ip: 0x7f85e663d04d)
From worker 2:  read_buf at /root/.julia/packages/CodecZlib/DAjXH/deps/usr/lib/libz.so (unknown line)
From worker 2:  fill_window at /root/.julia/packages/CodecZlib/DAjXH/deps/usr/lib/libz.so (unknown line)
From worker 2:  deflate_slow at /root/.julia/packages/CodecZlib/DAjXH/deps/usr/lib/libz.so (unknown line)
From worker 2:  deflate at /root/.julia/packages/CodecZlib/DAjXH/deps/usr/lib/libz.so (unknown line)
From worker 2:  deflate! at /root/.julia/packages/CodecZlib/DAjXH/src/libz.jl:77 [inlined]
From worker 2:  process at /root/.julia/packages/CodecZlib/DAjXH/src/compression.jl:175 [inlined]
From worker 2:  callprocess at /root/.julia/packages/TranscodingStreams/SaPZ8/src/stream.jl:603
From worker 2:  flushbuffer at /root/.julia/packages/TranscodingStreams/SaPZ8/src/stream.jl:560
From worker 2:  flushbufferall at /root/.julia/packages/TranscodingStreams/SaPZ8/src/stream.jl:567 [inlined]
From worker 2:  write at /root/.julia/packages/TranscodingStreams/SaPZ8/src/stream.jl:450 [inlined]
From worker 2:  transcode at /root/.julia/packages/TranscodingStreams/SaPZ8/src/transcode.jl:85
From worker 2:  transcode at /root/.julia/packages/TranscodingStreams/SaPZ8/src/transcode.jl:34
From worker 2:  loadPartAndStore at ./REPL[24]:6
From worker 2:  unknown function (ip: 0x7f85d42848d7)
From worker 2:  jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
From worker 2:  #28 at ./REPL[27]:7
From worker 2:  jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
From worker 2:  jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
From worker 2:  jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:563
From worker 2:  #112 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:269
From worker 2:  run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:56
From worker 2:  macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:269 [inlined]
From worker 2:  #111 at ./task.jl:262
From worker 2:  jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
From worker 2:  jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
From worker 2:  start_task at /buildworker/worker/package_linux64/build/src/task.c:268
From worker 2:  unknown function (ip: 0xffffffffffffffff)
From worker 2:  Allocations: 2455881495 (Pool: 2455857282; Big: 24213); GC: 5298
Worker 2 terminated.

I could not find any reproducible example, it just happens every so often. The code I have been using is somewhat reminiscent to this one.

using CodecZlib
using AWSS3

files = # files to process

@everywhere function loadPartAndStore(file)
    part = # download file
    outputBytes = #process part
    AWSS3.s3_put(..., CodecZlib.transcode(CodecZlib.GzipCompressor, outputBytes))
end

pmap(loadPartAndStore, files)

I am aware of the fact that this is more of zlib issue, however it might be possible, that the way I use it is somewhat non-standard. For example I know, that the code above allocates the working space of GzipCompressor every iteration, that is mainly because I am not sure how to treat the preallocation (described here) in distributed environment.

Julia v0.7.0
CodecZlib v0.5.1
TranscodingStreams v0.8.1
janfrancu commented 5 years ago

I have done further investigation and I have found MWE, which produces the error, however the issue seems to be in the FlatBuffers.bytes(FlatBuffers.build!(...)), which produces 'corrupted' bytes. More precisely the lenght of the array is around 76M, however part of it resides in read only sector, as is indicated by ERROR: ReadOnlyMemoryError() raised just by showing the array.

I will close this issue and report it to the Flatbuffer.jl repository.