Gripes about LagrangianParticles: construction, analysis, useability

glwagner commented 1 week ago

I'm wondering if anyone thinks that there are some big improvements that can be made to LagrangianParticles. Here is a list of issues that come to mind:

it is inconvenient to initialize the locations x, y, z independently to instantiate LagrangianParticles(; x, y, z). This is error-prone; for example x, y, z have to all be the same length. I think I would rather have a utility that initializes some number of particles, and then set! the coordinates of all or selected particles.
I shouldn't have to provide the locations of particles in Flat directions -- it makes no sense. But right now I do have to write something like z = zeros(Np) for Np particles in a two-dimensional simulation...
Output is not friendly. First of all there is no utility like FieldTimeSeries. Second of all, the first thing I think of to do is extract a particular trajectory. But the way the output is constructed, its a little convoluted to extract the trajectory of a single particle. All of the positions at a particular moment in time are saved in a NamedTuple of arrays. So when I load one snapshot, I get all of the particles. To construct the entire trajectory of just one particle, I have to load all of the data! This seems problematic for huge datasets especially. Note that analysis is typically done per-trajectory. So if trying to batch an analysis of a huge dataset of trajectories, I would want to be able to load some subset of trajectories at a time. This is precluded by the current design.

Does anyone else have these issues or am I the only one? Are there issues that I missed?

glwagner commented 1 week ago

Here's an example of the kind of convoluted code required to get a trajectory (2D):

trajectoriesfilename = "trajectories.jld2"
file = jldopen(trajectoriesfilename)
saveiters = parse.(Int, keys(file["timeseries/t"]))
data = [file["timeseries/particles/$iter"] for iter in saveiters]
close(file)

Nt = length(data)
function trajectory(data, p)
    Nt = length(data)
    x = [data[n].x[p] for n = 1:Nt]
    y = [data[n].y[p] for n = 1:Nt]
    return x, y
end

xp1, yp1 = trajectory(data, 1)

Maybe there is a better way though.

ali-ramadhan commented 1 week ago

You're not alone!

I think I would rather have a utility that initializes some number of particles, and then set! the coordinates of all or selected particles.

I like this idea. It would be nice if it was also easy to add and remove particles. Right now it's actually easy to add particles with append! but removing particles on the GPU is non-trivial. Maybe it'll always require a copy of some kind but here's an example callback I use:

function remove_particles!(simulation)
    particles = simulation.model.particles
    properties = particles.properties

    z_mixed_layer = -50.0
    p_to_keep = properties.z .>= z_mixed_layer

    # Get the underlying arrays
    arrays = components(properties)

    # Compact each array
    for (key, arr) in pairs(arrays)
        kept_values = arr[p_to_keep]
        CUDA.resize!(arr, length(kept_values))
        copyto!(arr, kept_values)
    end

    return nothing
end

I shouldn't have to provide the locations of particles in Flat directions

True. I don't find it too annoying but it is not consistent with the rest of the user interface. Also not the most efficient use of memory.

Output is not friendly.

Also agree here.

I think NetCDF output for particles is easier to work with since it's a multi-dimensional array, so pulling out a trajectory is just indexing. Not sure how it's chunked (if at all) though so huge data might be a problem.

But it shouldn't be hard to add something nice for JLD2. ParticleTimeSeries?

ali-ramadhan commented 1 week ago

On that note, OceanBioME.jl has a set! for particles: https://github.com/OceanBioME/OceanBioME.jl/blob/main/src/Particles/set.jl

jagoosw commented 1 week ago

I agree that particles aren't the most usable. As well as set! that Ali mentioned, OceanBioME particles also default initialise themselves to have x, y, and z set to zero(n) and the user writes ~~BiogeochemicalParticles(n, grid; ...)~~ BiogeochemicalParticles(n; grid, ...) (will change all of the ;grids to grid; in OceanBioME soon).

So you just write e.g. :

n = 10

particles = BiogeochemicalParticles(n; grid, biogeochemistry = SomeParticleBGC())

set!(particles, x = [i for i in 1:n])

For output reading, I agree there is room for a much better interface. It might be less storage efficient but it might be easier to comprehend files if they were structured:

particle1/
    timeseries/
        x
        y
        z
        some_field
particle2/
    timeseries/
        x
        y
        z
        some_field
etc.

And then we could do something like FieldDataSet(path, particle_number)?

glwagner commented 1 week ago

So if I just use particles = BiogeochemicalParticles(n; grid) with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?

jagoosw commented 1 week ago

So if I just use particles = BiogeochemicalParticles(n; grid) with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?

I think that wouldn't work although I've never tried

ali-ramadhan commented 1 week ago

Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).

One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming https://github.com/OceanBioME/OceanBioME.jl/issues/231 happens).

glwagner commented 1 week ago

I feel like the fact that its worth it to implement a completely independent BiogeochemicalParticles with its own user interface indicates there are even deeper problems with LagrangianParticles. Why can't we always use LagrangianParticles?

jagoosw commented 1 week ago

Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).

One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming OceanBioME/OceanBioME.jl#231 happens).

I have thought before that the current particle interface (in both Oceananigans and OceanBioME) does make it quite challenging to have multiple different types of particles which I could also see being useful.

jagoosw commented 1 week ago

I feel like the fact that its worth it to implement a completely independent BiogeochemicalParticles with its own user interface indicates there are even deeper problems with LagrangianParticles. Why can't we always use LagrangianParticles?

I think the reason I initially did this was to try and make the coupling between the particle and tracer bgcs work, but I think there are definitely other ways it could be done and the BiogeochemicalParticles go in the normal particles slot

glwagner commented 1 week ago

The issue in the long run from an engineering perspective is that it encourages fragmentation of the code. So if some other package besides OceanBioME needs a similar kind of particle, they may have to reimplement it. Then if that project gets a bunch of funding and software developers, they could advance their own implementation, and OceanBioME doesn't reap the benefits of that. That to me is the idea of open source is to allow people to collaborate where there is a common interest. It does require some degree of centralization of design to support the collaboration though. And it can require more investment up front, vs getting something working quick-and-dirty.

That's actually the approach we took with biogeochemistry overall. It is possible to implement biogeochemistry as a forcing, and there's no need to have a property attatched to the models. But the purpose of adding that slot was to create an environment of collaboration between different implementations of biogeochemistry. So considering all of that work, the generation of an independent particles functionality indicates that this effort was only partially a success --- we did not succeed in motivating all parties to continue along the path of building shared capabilities in Oceananigans.

I'm wondering if we need to write something about this, or find a way to make sure that the vision is more clear. Do we think there is a need for this? Or perhaps it is too ambitious, and things like this are bound to happen no matter how much effort we put into trying to prevent it?

I think there is also scope to pursue quick prototyping rather than taking a more linear comprehensive path from the beginning, which is supported by Julia's design. But it seems like the danger of this is that the second step (comprehensive future-proof design) doesn't actually occur, because when the prototype is demonstrated, work does on, and technical debt is inherited. So a desire to do quick prototypes also must be coupled with strong discipline to also implement the long-term plan with the prototype has finished serving its purpose (ideally I think prototypes should be worked on for no more than a week or so).

CliMA / Oceananigans.jl

Gripes about LagrangianParticles: construction, analysis, useability #3919