OceanBioME / OceanBioME.jl

🌊 🦠 🌿 A fast and flexible modelling environment written in Julia for modelling the coupled interactions between ocean biogeochemistry, carbonate chemistry, and physics
https://oceanbiome.github.io/OceanBioME.jl/
MIT License
40 stars 20 forks source link

Use `Oceananigans.OutputReaders.FieldTimeSeries` instead of manually reading output from `.jld2` files #95

Closed navidcy closed 1 year ago

navidcy commented 1 year ago

Using FieldTimeSeries would cleanup this bit a lot. (And similarly for other examples.)

https://github.com/OceanBioME/OceanBioME.jl/blob/505914e9d49ac177062ad25e5205805dcd9bba1d/examples/eady.jl#L120-L136

navidcy commented 1 year ago

E.g., https://clima.github.io/OceananigansDocumentation/stable/generated/horizontal_convection/#Load-saved-output,-process,-visualize

jagoosw commented 1 year ago

The reason I tend not to use FieldTimeSeries is that it seems to be much slower than manually reading the files, do you know if anyone else has experienced that?

glwagner commented 1 year ago

I haven't experienced it, but I think it's important to fix if there's a performance issue

jagoosw commented 1 year ago

I made an MWE:

julia> using Oceananigans, JLD2, BenchmarkTools
julia> grid = RectilinearGrid(size = (256, 256, 256), extent = (1000, 1000, 1000))
julia> model = NonhydrostaticModel(; grid)
julia> uáµ¢(x, y, z) = rand()
julia> set!(model, u = uáµ¢)
julia> simulation = Simulation(model, Δt = 2.0, stop_iteration = 10) # 256 ^ 3 * 10 * 4 $\approx$ 1GB per variable
julia> simulation.output_writers[:u] = JLD2OutputWriter(model, model.velocities, filename = "output_writer_test.jld2", > schedule = IterationInterval(1), overwrite_existing=true)
julia> run!(simulation)
julia> @btime FieldTimeSeries("output_writer_test.jld2", "u");
  426.597 ms (13465 allocations: 2.75 GiB)
julia> function load_u(path)
           file = jldopen(path)
           iterations = keys(file["timeseries/t"])

           results = ones(size(grid)..., length(iterations))

           @inbounds for (idx, it) in enumerate(iterations)
               results[:, :, :, idx] = file["timeseries/u/$it"][1:grid.Nx, 1:grid.Ny, 1:grid.Nz]
           end

           close(file)

           return results
       end
julia> @btime load_u("output_writer_test.jld2")
  721.824 ms (631 allocations: 4.13 GiB)
julia> @btime begin 
        FieldTimeSeries("output_writer_test.jld2", "u");
        FieldTimeSeries("output_writer_test.jld2", "v");
        FieldTimeSeries("output_writer_test.jld2", "w");
end
  2.007 s (40395 allocations: 8.26 GiB)
julia> function load_uvw(path)
           file = jldopen(path)
           iterations = keys(file["timeseries/t"])

           results = ones(3, size(grid)..., length(iterations))

           @inbounds for (idx, it) in enumerate(iterations)
               results[1, :, :, :, idx] = file["timeseries/u/$it"][1:grid.Nx, 1:grid.Ny, 1:grid.Nz]
               results[2, :, :, :, idx] = file["timeseries/v/$it"][1:grid.Nx, 1:grid.Ny, 1:grid.Nz]
               results[3, :, :, :, idx] = file["timeseries/v/$it"][1:grid.Nx, 1:grid.Ny, 1:grid.Nz]
           end

           close(file)

           return results
       end
julia> @btime load_uvw("output_writer_test.jld2");
  786.793 ms (631 allocations: 4.13 GiB)

For reading a single variable FieldTimeSeries seems to be faster, but for multiple variables, it is slower. I'm not sure how this will scale with file size too, I suspect that for less iterations FieldTimeSeries might become slower because it must have a constant fixed overhead loading the grid etc.

glwagner commented 1 year ago

The difference is only a factor of 2 though, and I guess comes from using a 5D array rather than 3 4D arrays. Is it really important to save that? We expect to encounter these costs rarely (ie once per analysis), they aren't part of some hot inner loop.

jagoosw commented 1 year ago

Yeah, it doesn't seem as much of a problem as I thought it was (I don't really remember when I started doing it like this). After I've sorted the big file problem I can change the examples.

glwagner commented 1 year ago

As a side note, it could be possible to develop a FieldDataset abstraction that combines all the data into a single array. We have something like that but the approach we're taking currently is different (but FieldDataset is not really developed right now)

https://github.com/CliMA/Oceananigans.jl/blob/main/src/OutputReaders/field_dataset.jl

glwagner commented 1 year ago

Note that load_u is basically identical to what FieldTimeSeries already does

navidcy commented 1 year ago

Closing this after #100