Open tpapp opened 6 years ago
So is it very slow, or non-terminating? An MWE would definitely be useful.
I didn't particularly write the BSON code to be fast, so there's probably a lot of low-hanging fruit for performance. Loading large isbits
arrays should certainly have similar performance with any serialiser.
I have pretty much the same issue, can't even load my data set (which is 246M in BSON and 319M in JLD2) because it crashes (with no error, I assume it hits some limit). I am surprised that the BSON is smaller considering that you store a type description with each instance of a struct (and my dataset includes a lot of nested structs), but then I have no idea what JLD2 is doing.
Anyway I made a PR #23 with a benchmark and 3-4x speed improvement so far. Probably entirely from improving array processing.
The data set now loads in about ~30 seconds. If done independently parsing takes ~10 seconds and raising to my types about ~15 seconds.
It turns out the crash was happening during raising and is possibly a bug in Julia:
julia> BSON.raise_recursive(res)
signal (11): Segmentation fault
in expression starting at no file:0
has_free_typevars at /buildworker/worker/package_linux64/build/src/jltypes.c:151
has_free_typevars at /buildworker/worker/package_linux64/build/src/jltypes.c:155 [inlined]
jl_has_free_typevars at /buildworker/worker/package_linux64/build/src/jltypes.c:180
inst_type_w_ at /buildworker/worker/package_linux64/build/src/jltypes.c:1504
jl_instantiate_unionall at /buildworker/worker/package_linux64/build/src/jltypes.c:940
arg_type_tuple at /buildworker/worker/package_linux64/build/src/gf.c:1648
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2151
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1537 [inlined]
jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:556
newstruct at /home/rich/.julia/dev/BSON/src/extensions.jl:103
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1537 [inlined]
jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:556
#43 at /home/rich/.julia/dev/BSON/src/extensions.jl:115
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
_raise_recursive at /home/rich/.julia/dev/BSON/src/read.jl:79
#45 at /home/rich/.julia/dev/BSON/src/extensions.jl:124
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
raise_recursive at /home/rich/.julia/dev/BSON/src/read.jl:88
...
The crash happens when newstuct
tries to call newstruct!
on extensions.jl:102 (line number is different in stack trace because I put initstruct on a separate line). Copying the contents of newstruct! inline avoids the crash.
I tried
BSON.jl
with a large dataset. Cf JLD2:whereas
BSON.load("/tmp/test.bson")
never terminates (I interrupted after 5 minutes). The file itself is around 600M (slightly smaller than the JLD2 file which is around 700M). I can make an MWE if the issue is not known.