Open abieler opened 8 years ago
I forgot I am on linux and v0.4.5
Sounds like this is the issue as well. With how long it's been around, it seems like they have marked as "Do not fix" with JLD. Quite a shame really. Sounds like you'll have to figure out how to use the HDF5 format instead as well.
I now convert my dates to unix-time and save them as h5. then loading and converting back to dates with
Dates.unix2datetime()
I attached timings for two versions of loading the data. 1st with h5read()
and 2nd with opening the file for read with fid = h5open()
and then loading data with read(fid, ...)
.
Not surprising the last version is the fastest. For the first 1 k loops the timing differences seem almost constant, but after ~10 k iterations the jld version is about 2 orders of magnitude slower. If I get to it I ll do some profiling.
So most time is spent in h5f_get_obj_ids()
in HDF5/src/plain.jl at line 2182 and 2186 which is a ccall to
(:H5Fget_obj_count, libhdf5)
and (:H5Fget_obj_ids, libhdf5)
respectively.
So not sure something can be done about this..
cheers andre
Bless you, @abieler, for digging into this! So it's definitely the C library, not any of the julia code.
Try the trick in the last post of that issue, https://github.com/JuliaLang/HDF5.jl/issues/170#issuecomment-209399736?
Not sure it is the same problem. This here is loading content from a small file a lot of times, the other is creating a file with lots of entries. I ll try anyway of course ;)
Oh, I see (I didn't read carefully enough). You might consider using the "dictionary interface," https://github.com/JuliaLang/JLD.jl/blob/master/doc/jld.md#usage, so it doesn't waste time opening/closing the file frequently.
Also appears similar to https://github.com/JuliaLang/julia/issues/17554
jldTimings.zip When loading an array with DateTimes with the load() function the loading times increase over time. The same does not happen when loading an array of floats.
Attached are a julia script and two data files to reproduce this behavior. Run the script with
julia timeJLD.jl N
where N is the number of iterations.myDates.jld has the array with datetimes
date
myArray.jld has the array with floatsyy
I ran with N = 5k to 10k.
In real life I load the content from different files of course... Cheers Andre