JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
549 stars 85 forks source link

88-99.2% compilation time, why? #416

Closed Cvikli closed 1 year ago

Cvikli commented 2 years ago

Hey Julianners, I don't understand what can be the problem with our JLD2 load times. The compilation time continuously goes lower and lower(99.2%...99.0%...96%...92%...88%) with more and more run but it is still significant. Also we don't use any complex structure I think.

Can someone check this simple test code and guess what is the problem under the hood?

test = randn(1000,10,10)
@time @save "test.jld2" test
@time @load "test.jld2" test

JLD2 is the best package to serialize and deserialize object from Julia I think. And this compilation time seems pretty big. Isn't it possible to make some type safety or something to bridge this 10-100x slowdowns?

Thank you for your great work!

JonasIsensee commented 2 years ago

Hi @Cvikli,

I've actually put in quite a bit work trying to reduce compile times. As a matter of fact, I tested this just last week.

JLD2 is typestable where useful and does not introduce any invalidations. There are invalidations that require (re-)compilation introduced by dependencies.

As tested in julia v1.8, just about 25% of the whole "compile time" is used for inference. More importantly, some 75% of the time appears to be used for LLVM code gen which I cannot optimize. (the latter also can't be cached during precompilation)

Cvikli commented 2 years ago

25% is serious improvement actually!

Is this something that comes with the 4.23 actually then?

JonasIsensee commented 2 years ago

No, I did not say anything about recent speed improvements.

the "compile time" that you measure is made up of different parts (here mostly inference and LLVM code gen)

Cvikli commented 2 years ago

I see. For me this is pretty interesting... Maybe it has to generate the whole code again and again to make sure the recipient struct can handle the incoming dataset? So you cannot reuse the same function again right?

JonasIsensee commented 2 years ago

Are you talking about consecutive runs within the same julia session? In that case, please consider using load, save, or jldsave.

Usage of @save is discouraged and could be connected to this.

Cvikli commented 2 years ago

Indeed! (I don't know why I didn't apply when I read it many time in the readme! :D )

I am just curious why @save and @load generate a code that is like dic = save("fname"); args[2:end] = tuple(dic[key] for key in keys(dic))

JonasIsensee commented 1 year ago

I am not sure, I fully understand.

The code for @load has not been touched in 5 years. @load and @save are no longer documented / recommended because they have weird scoping behaviour, that I can't fix. In this past, this led to a lot of confused questions.