Open glwagner opened 1 week ago
Here's an example of the kind of convoluted code required to get a trajectory (2D):
trajectoriesfilename = "trajectories.jld2"
file = jldopen(trajectoriesfilename)
saveiters = parse.(Int, keys(file["timeseries/t"]))
data = [file["timeseries/particles/$iter"] for iter in saveiters]
close(file)
Nt = length(data)
function trajectory(data, p)
Nt = length(data)
x = [data[n].x[p] for n = 1:Nt]
y = [data[n].y[p] for n = 1:Nt]
return x, y
end
xp1, yp1 = trajectory(data, 1)
Maybe there is a better way though.
You're not alone!
I think I would rather have a utility that initializes some number of particles, and then
set!
the coordinates of all or selected particles.
I like this idea. It would be nice if it was also easy to add and remove particles. Right now it's actually easy to add particles with append!
but removing particles on the GPU is non-trivial. Maybe it'll always require a copy of some kind but here's an example callback I use:
function remove_particles!(simulation)
particles = simulation.model.particles
properties = particles.properties
z_mixed_layer = -50.0
p_to_keep = properties.z .>= z_mixed_layer
# Get the underlying arrays
arrays = components(properties)
# Compact each array
for (key, arr) in pairs(arrays)
kept_values = arr[p_to_keep]
CUDA.resize!(arr, length(kept_values))
copyto!(arr, kept_values)
end
return nothing
end
I shouldn't have to provide the locations of particles in
Flat
directions
True. I don't find it too annoying but it is not consistent with the rest of the user interface. Also not the most efficient use of memory.
Output is not friendly.
Also agree here.
I think NetCDF output for particles is easier to work with since it's a multi-dimensional array, so pulling out a trajectory is just indexing. Not sure how it's chunked (if at all) though so huge data might be a problem.
But it shouldn't be hard to add something nice for JLD2. ParticleTimeSeries
?
On that note, OceanBioME.jl has a set!
for particles: https://github.com/OceanBioME/OceanBioME.jl/blob/main/src/Particles/set.jl
I agree that particles aren't the most usable. As well as set!
that Ali mentioned, OceanBioME particles also default initialise themselves to have x
, y
, and z
set to zero(n)
and the user writes BiogeochemicalParticles(n, grid; ...)
BiogeochemicalParticles(n; grid, ...)
(will change all of the ;grid
s to grid;
in OceanBioME soon).
So you just write e.g. :
n = 10
particles = BiogeochemicalParticles(n; grid, biogeochemistry = SomeParticleBGC())
set!(particles, x = [i for i in 1:n])
For output reading, I agree there is room for a much better interface. It might be less storage efficient but it might be easier to comprehend files if they were structured:
particle1/
timeseries/
x
y
z
some_field
particle2/
timeseries/
x
y
z
some_field
etc.
And then we could do something like FieldDataSet(path, particle_number)
?
So if I just use particles = BiogeochemicalParticles(n; grid)
with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?
So if I just use
particles = BiogeochemicalParticles(n; grid)
with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?
I think that wouldn't work although I've never tried
Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).
One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming https://github.com/OceanBioME/OceanBioME.jl/issues/231 happens).
I feel like the fact that its worth it to implement a completely independent BiogeochemicalParticles
with its own user interface indicates there are even deeper problems with LagrangianParticles
. Why can't we always use LagrangianParticles
?
Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).
One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming OceanBioME/OceanBioME.jl#231 happens).
I have thought before that the current particle interface (in both Oceananigans and OceanBioME) does make it quite challenging to have multiple different types of particles which I could also see being useful.
I feel like the fact that its worth it to implement a completely independent
BiogeochemicalParticles
with its own user interface indicates there are even deeper problems withLagrangianParticles
. Why can't we always useLagrangianParticles
?
I think the reason I initially did this was to try and make the coupling between the particle and tracer bgcs work, but I think there are definitely other ways it could be done and the BiogeochemicalParticles
go in the normal particles
slot
The issue in the long run from an engineering perspective is that it encourages fragmentation of the code. So if some other package besides OceanBioME needs a similar kind of particle, they may have to reimplement it. Then if that project gets a bunch of funding and software developers, they could advance their own implementation, and OceanBioME doesn't reap the benefits of that. That to me is the idea of open source is to allow people to collaborate where there is a common interest. It does require some degree of centralization of design to support the collaboration though. And it can require more investment up front, vs getting something working quick-and-dirty.
That's actually the approach we took with biogeochemistry overall. It is possible to implement biogeochemistry as a forcing, and there's no need to have a property attatched to the models. But the purpose of adding that slot was to create an environment of collaboration between different implementations of biogeochemistry. So considering all of that work, the generation of an independent particles functionality indicates that this effort was only partially a success --- we did not succeed in motivating all parties to continue along the path of building shared capabilities in Oceananigans.
I'm wondering if we need to write something about this, or find a way to make sure that the vision is more clear. Do we think there is a need for this? Or perhaps it is too ambitious, and things like this are bound to happen no matter how much effort we put into trying to prevent it?
I think there is also scope to pursue quick prototyping rather than taking a more linear comprehensive path from the beginning, which is supported by Julia's design. But it seems like the danger of this is that the second step (comprehensive future-proof design) doesn't actually occur, because when the prototype is demonstrated, work does on, and technical debt is inherited. So a desire to do quick prototypes also must be coupled with strong discipline to also implement the long-term plan with the prototype has finished serving its purpose (ideally I think prototypes should be worked on for no more than a week or so).
I'm wondering if anyone thinks that there are some big improvements that can be made to
LagrangianParticles
. Here is a list of issues that come to mind:it is inconvenient to initialize the locations
x, y, z
independently to instantiateLagrangianParticles(; x, y, z)
. This is error-prone; for examplex, y, z
have to all be the same length. I think I would rather have a utility that initializes some number of particles, and thenset!
the coordinates of all or selected particles.I shouldn't have to provide the locations of particles in
Flat
directions -- it makes no sense. But right now I do have to write something likez = zeros(Np)
forNp
particles in a two-dimensional simulation...Output is not friendly. First of all there is no utility like
FieldTimeSeries
. Second of all, the first thing I think of to do is extract a particular trajectory. But the way the output is constructed, its a little convoluted to extract the trajectory of a single particle. All of the positions at a particular moment in time are saved in aNamedTuple
of arrays. So when I load one snapshot, I get all of the particles. To construct the entire trajectory of just one particle, I have to load all of the data! This seems problematic for huge datasets especially. Note that analysis is typically done per-trajectory. So if trying to batch an analysis of a huge dataset of trajectories, I would want to be able to load some subset of trajectories at a time. This is precluded by the current design.Does anyone else have these issues or am I the only one? Are there issues that I missed?