JuliaGeo / NetCDF.jl

NetCDF support for the julia programming language
http://juliageo.org/NetCDF.jl/
MIT License
115 stars 28 forks source link

Reading data along chunked dimension does not scale linearly with amount of data #116

Open ali-ramadhan opened 4 years ago

ali-ramadhan commented 4 years ago

Super cool work on integrating DiskArrays.jl with NetCDF.jl! Looking forward to ditching xarray in favor of a pure Julia solution.

@visr helped me get up and running but we noticed that grabbing 2x as much data seems to take ~4x longer whereas I expected it to scale linearly. I am unfortunately interested in grabbing data along the dimension with chunk size 1...

julia> using NetCDF

julia> ds = NetCDF.open("/home/alir/cnhlab004/bsose_i122/bsose_i122_2013to2017_1day_Theta.nc", "THETA")
Disk Array with size 2160 x 588 x 52 x 1826

julia> NetCDF.getchunksize(ds)
(2160, 588, 19, 1)

julia> @time ds[100, 200, :, 300]
  0.012066 seconds (48 allocations: 2.500 KiB)

julia> @time ds[100, 200, :, 320:330]
  0.010111 seconds (55 allocations: 4.750 KiB)

julia> @time ds[100, 200, :, 300:400]
  5.256234 seconds (56 allocations: 23.016 KiB)

julia> @time ds[100, 200, :, 600:800]
 19.074392 seconds (56 allocations: 43.328 KiB)
visr commented 4 years ago

It's great to have an example of such a large NetCDF. At this moment I cannot tell if this time is spent in the NetCDF C library or in the Julia wrapper code. Though I think running the slower calls under a profiler should be able to give that information.

meggart commented 4 years ago

I agree with @visr it is hard to say where the time is spent. Please note also that the NetCDF C library does some internal caching, so I guess your 3rd call was profiting from the previous reads. I found it very difficult to debug these kinds of problems. Ideally you would restart your Julia session after every data access to make sure NetCDF did not cache anything, but then you include precompilation in your timings...

bjarthur commented 7 months ago

i cannot reproduce with my dataset which is of similar size but only three dimensions. @ali-ramadhan is this still a problem for you?