Closed alex-s-gardner closed 1 year ago
What about to use Array
or indexing instead:
julia> using Downloads, NetCDF, NCDatasets
julia> @time nc = copy(ncread("N52W175.nc", "z"));
11.110762 seconds (259.33 k allocations: 1.498 GiB, 0.38% gc time, 6.92% compilation time)
julia> @time nc2 = Array(Dataset("N52W175.nc", "r")["z"]);
10.935116 seconds (140.02 k allocations: 1.676 GiB, 0.60% gc time, 2.10% compilation time)
julia> @time nc2 = Dataset("N52W175.nc", "r")["z"][:,:,:];
10.392991 seconds (399 allocations: 1.666 GiB, 0.57% gc time)
(@v1.9) pkg> status NCDatasets
Status `~/.julia/environments/v1.9/Project.toml`
[85f8d34a] NCDatasets v0.12.17 `~/.julia/dev/NCDatasets`
(@v1.9) pkg> status NetCDF
Status `~/.julia/environments/v1.9/Project.toml`
[30363a11] NetCDF v0.11.7
In NCDatasets, we do not overload Base.copy
. So copy
loads every element individually (https://alexander-barth.github.io/NCDatasets.jl/stable/performance/).
I don't think you need the copy function of NetCDF.jl either and ncread returns directly an Array
.
For reference, CUDA.copy
, creates a copy of an array on GPU (and does not transfer it to the CPU memory). If we would ever overload copy
in NCDatasets, I am not sure if a transfer to from disk to CPU memory would be the correct think to do.
@alex-s-gardner Can this be closed?
Thanks for the explanation. Closing this issue.
Thanks a ton for the great package... it really intuitive and powerful.
Playing around with the package for production scale workflows I find a significant performance gap between NetCDF.jl and NCDatasets.jl.
I know I've pestered about this elsewhere but is there any value having NetCDF.jl as the backend to NCDatasets.jl? NCDatasets.jl has such a great design and NetCDF.jl is really performant. This would also help reduce duplicate code maintenance. Regardless, thanks for all of the contributions!