Open simonbyrne opened 5 years ago
Thank you for your insightful comments!
Indeed, Statistics.mean
(and friends) are not overloaded and thus they are fetching every element individually from the netCDF file which is slow. I added a section in the documentation about performance as suggested.
It never occurred to me, to load an array in RAM with the Array
constructor, but it makes sense and I added a method for this which give a speed-up as expected. If myvar
is a 100x100 Float64 Matrix, before we had:
julia> @btime Array(ds["myvar"])
115.359 ms (170042 allocations: 8.93 MiB)
Now after my recent commit:
julia> @btime Array(ds["myvar"]);
69.404 μs (49 allocations: 170.43 KiB)
It is true that ncvar[:]
is not consistent with the behaviour of regular Julia arrays, but it was a price that I was willing to pay for a concise syntax. In fact, the first NetCDF library that I used was the netcdf matlab toolbox (now abandoned) which implemented exactly the same (slightly inconsistent) behaviour in matlab.
What alternatives could be implemented?
ds["myvar"][:,:] # does not work so well if the dimensionality of myvar is not known a priori
Array(ds["myvar"]) # works now, but it feels a bit wordy to me
ds["myvar"][] # like references, but it looks a bit obscure to most people I guess
Great, thanks!
I don't think there's anything wrong with Array(ds["myvar"])
. If you want something that would work for scalars as well, it would probably work to overload copy
?
I had a look at HDF5.jl
(https://github.com/JuliaIO/HDF5.jl/blob/master/doc/hdf5.md#reading-and-writing-data) and they use read
, for example:
variable = read(ds["myvar"])
Calling functions which iterate over array elements (e.g.
Statistics.mean
) on netcdf datasets can be very slow. It would be useful to have some performance tips to e.g. first convert a dataset to anArray
.On that note, I noticed that simply calling
Array(dataset)
is also slow. I take it from the examples in the manual that the suggested way to convert is to calldataset[:]
. However this has different behaviour from ordinary Julia multidimensional arrays, which return a vector: