Significant performance gap between NetCDF.jl and NCDatasets.jl

alex-s-gardner commented 1 year ago

Thanks a ton for the great package... it really intuitive and powerful.

Playing around with the package for production scale workflows I find a significant performance gap between NetCDF.jl and NCDatasets.jl.

using Downloads, NetCDF, NCDatasets
path2file = "https://its-live-data.s3.amazonaws.com/Test/N52W175.nc";
Downloads.download(path2file, "N52W175.nc");

# using NetCDF
@time nc = copy(ncread("N52W175.nc", "z"));
14.245910 seconds (265.26 k allocations: 1.499 GiB, 0.47% gc time, 4.77% compilation time)

# using NCDatasets
@time nc2 = copy(Dataset("N52W175.nc", "r")["z"]);
10s of minutes... I had to kill the job

I know I've pestered about this elsewhere but is there any value having NetCDF.jl as the backend to NCDatasets.jl? NCDatasets.jl has such a great design and NetCDF.jl is really performant. This would also help reduce duplicate code maintenance. Regardless, thanks for all of the contributions!

Alexander-Barth commented 1 year ago

What about to use Array or indexing instead:

julia> using Downloads, NetCDF, NCDatasets

julia> @time nc = copy(ncread("N52W175.nc", "z"));
 11.110762 seconds (259.33 k allocations: 1.498 GiB, 0.38% gc time, 6.92% compilation time)

julia> @time nc2 = Array(Dataset("N52W175.nc", "r")["z"]);
 10.935116 seconds (140.02 k allocations: 1.676 GiB, 0.60% gc time, 2.10% compilation time)

julia> @time nc2 = Dataset("N52W175.nc", "r")["z"][:,:,:];
 10.392991 seconds (399 allocations: 1.666 GiB, 0.57% gc time)

(@v1.9) pkg> status NCDatasets
Status `~/.julia/environments/v1.9/Project.toml`
  [85f8d34a] NCDatasets v0.12.17 `~/.julia/dev/NCDatasets`

(@v1.9) pkg> status NetCDF
Status `~/.julia/environments/v1.9/Project.toml`
  [30363a11] NetCDF v0.11.7

In NCDatasets, we do not overload Base.copy. So copy loads every element individually (https://alexander-barth.github.io/NCDatasets.jl/stable/performance/).

I don't think you need the copy function of NetCDF.jl either and ncread returns directly an Array.

For reference, CUDA.copy, creates a copy of an array on GPU (and does not transfer it to the CPU memory). If we would ever overload copy in NCDatasets, I am not sure if a transfer to from disk to CPU memory would be the correct think to do.

Alexander-Barth commented 1 year ago

@alex-s-gardner Can this be closed?

alex-s-gardner commented 1 year ago

Thanks for the explanation. Closing this issue.

Alexander-Barth / NCDatasets.jl

Significant performance gap between NetCDF.jl and NCDatasets.jl #218