Increasing particle count inceases the total energy at every iteration

stndbye commented 9 months ago

Thanks for this, was looking to play around with SPH for ages and this makes it really convenient!

My plan is to create a self gravitating gas cloud, with as high a resolution as possible, but when i increased the particle count by adjusting Num particles per axis, the system just exploded. Seems like energy is injected at every iteration.

Changing viscosity in either direction from the default 0.001 further increases the energy and results in weird patterns if i scale time down. Setting pressure multiplier to 1 makes it a bit better, but velocities still go crazy in any overdense region.

Is this the correct way to increase particle count: Simulation inspector/Spawner 3D/Num particles per axis? Where should i look to fix the issue? Are there any constants or calculations that need to be adjusted along with the count?

stndbye commented 9 months ago

I found part of the issue, in FluidSim3D.compute, this line

Velocities[id.x] += viscosityForce * viscosityStrength * deltaTime; needs to go outside of the loop.

Still after this, the fluid periodically explodes with higher viscosity multipliers. Above 0.4 in the original setup, above 0.04 with 55 particles per axis.

stndbye commented 9 months ago

Had a little time to figure out how to make a PR for this. https://github.com/SebLague/Fluid-Sim/pull/7

Let me know if i missed something, but it seems to make viscosity a bit more tame. Think the rest of the problems i see here are related to the lightweight boundary checks and pressure scaling having a dependency on particle count or timestep.

Still, at the most optimal values, the velocities of all the particles slightly oscillate at a high (shared) frequency, which quickly turns into boiling with higher collision strength, viscosity or lower target pressure. Is there a way to improve that without reducing performance much? Really enjoying the 60 fps with 270k+ particles.

And what's the best way to fix it? Is it too high a deltaTime, would the solution be a dynamic timestep depending on density or such? Or the parameters should be limited or scale with each other somehow to not be able to provide totally unphysical values? Or boundary needs to have some real/virtual particles or just a more sophisticated adjustment?

SebLague commented 9 months ago

Hey I appreciate the PR, I've merged it in. I wonder if using a more advanced integration scheme like RK4 might help to improve the stability. I've never tried it before so not sure how big of an impact it has, or how involved it is to implement, but definitely something I'd like to test. And I agree that boundary particles would be good to try too.

I think dynamic timestep is usually done based on max particle velocity, which could be good to try to. As for parameter scaling -- I have no idea! It would be great to be able to change the number of particles at least without having to tweak the other parameters, but it's certainly very finicky at the moment and I'm not sure why.

stndbye commented 9 months ago

Thanks, will see if i can implement RK4 and create a PR - unless it's something you want to do any time soon.

Otherwise i'll have a go at self gravity. Calculating center of mass for cells efficiently is giving me a headache. If you happen to have any pointers for my troubles below please drop a comment.

Can't find a good parallel solution to group by and aggregate cells without race conditions. Was thinking of averaging every same-cell particle and the one next to it in the sorted array into a new array, keeping the ones without a pair - repeat log2n times. I just have near 0 compute experience, don't know how can i reduce the arrays in size between these steps without killing performance, so i end up with just the cell list with c-o-m position and particle count. Or maybe there is a better way to do this.

SebLague commented 9 months ago

I do want to learn about RK4 and implement it myself at some point, but it may be a while before I get to it so don't wait around for me if you want to try it soon :)

As for gravity, Barnes-Hut is the typical approach I think -- I have this paper bookmarked for a potential future video (haven't read it yet though): https://iss.oden.utexas.edu/Publications/Papers/burtscher11.pdf Good luck!

stndbye commented 9 months ago

Just what i need. Thanks!

stndbye commented 8 months ago

There is another issue with collision handling, the damping is only applied per velocity component so it makes the vector more parallel to the collision surface. Tends to bunch particles in edges and corners.

Found a possible cheap "solution" for the boundary problem, just heavily damp the reflected velocities of particles colliding with the surface, and not render particles close to the boundary. This allows the kernel to smooth the collision as if there were virtual particles, and heavily reduces the unphysical velocities that occur near the boundary. Looks a bit strange as particles slowly pop back into view, but seems much more stable. Also damping is not parameterized any longer, only depends on the fluid's properties.

If it's of any interest i can create a PR for it.

I checked that treecode paper (and realized i read it a few years ago), and decided it's not worth the effort for the amount of particles i can deal with real time. I could implement center of mass calculations for the existing SPH cells, but mem access would diverge a lot - i have doubts it's worth it either..?

I just popped the GPU GEM nbody code in the kernel instead, fixed the mass calc bug in it and can still run around 200k at 30fps. Added a couple starting conditions, like the galaxy collision from Dubinski, Sellwood's bar galaxy, a 64k plummer model, a milli-Millennial export plus the ones from GPU GEMs.

Can make a PR of that too, though i suspect it's too far from the original scope. In a physical sense it's really bad though, still have no idea how to parameterize it well for vacuum. Suspect SPH kernels need to change a bit, viscosity seems off in space, density/pressure relations should also probably change, and i should implement a gravity kernel for close NBody interactions instead of the pair calculation. Anyhow it's fun to watch. Thanks again for the opportunity!

Maxwellfire commented 8 months ago

@SebLague I hope I'm not jumping in here too unbidden, but hopefully I can provide some help.

Many of the standard integration schemes are not energy neutral. That is, the process of updating the time derivatives adds or subtracts energy from the system. The amount of energy that is added or removed is generally dependent on the timestep.

Forward Euler

This is readily apparent from the forward euler method that you used, which (I believe always) adds energy to the system. If you were to take large enough time steps with a forward euler approach, the system would become completely unstable, gaining arbitrary amounts of energy. Intuitively, this can be understood as follows.

Imagine two particles interacting, with some force between them that is dependent on their separation. Consider a moment in time, $t_1$. At this time, the particles have some force pushing them apart. Call that $F_1$. Now, with the forward euler method, we imagine that force acting for an entire timestep $\Delta t$. Over that timestep, the particles will have moved some distance, called $\Delta x$. Thus, the kinetic energy of a particular particle will have changed by force*distance, or $F_1 \Delta X$. Unfortunately, in reality, the force between the particles drops as they move apart. So in a smaller force will be acting over a smaller distance for most of the timestep. This difference in energy between our imagination and reality adds energy overall to the system. This is true even when you ensure that each particle has an equal and opposite reaction.

Backwards Euler

In the video you also mentioned using the predicted fields to compute the forces instead of the current fields. I believe that this is roughly equivalent to the backwards euler integrator (or at least it would be if you iterated to find a solution. I think what you're using is one iteration of an implicit euler scheme). Like the forwards euler, the backwards euler also changes the energy of the system. Instead of adding energy however, it always removes energy. This can be understood by similar logic:

Now, instead of using $F_1$ to compute the velocity, we instead use $F_2$, the force experienced at the position where the particle will end up. This under predicts the force. For most of the timestep, the particles experience a larger force than this and will travel a larger distance. Thus, we under predict the energy.

So using backwards euler is adding a damping effect. This causes the oscillations to die out. Ideally, the damping effect would be controlled by something like your viscosity, and not by the integrator that you use. You also want to be allowed to use a larger timestep.

Symplectic Methods

The way to resolve this is to use a scheme that preserves energy. There is some potential energy in the interactions between particles, and each particle has some kinetic energy. Using a clever stepping approach, we can ensure that however much potential energy is lost, we gain the same amount of kinetic energy. Now, the methods for doing so are still approximate. They don't guarantee that the calculated trajectories will be exactly correct. But they do guarantee that the energy before and after a timestep will approximately the same (minus explicitly added dissipation/damping), regardless of how large that timestep is. The term for the type of integrators that have this property is 'symplectic'. (Note that technically this doesn't exactly imply energy conservation, just something similar to it.) The simplest of these is called verlet integration. To start, you probably want the 'velocity verlet' approach. The scheme for that is as follows (copied from wikipedia):

Calculate ${\displaystyle \mathbf {v} \left(t+{\tfrac {1}{2}}\,\Delta t\right)=\mathbf {v} (t)+{\tfrac {1}{2}}\,\mathbf {a} (t)\,\Delta t}$
Calculate ${\displaystyle \mathbf {x} (t+\Delta t)=\mathbf {x} (t)+\mathbf {v} \left(t+{\tfrac {1}{2}}\,\Delta t\right)\,\Delta t}$
Derive ${\displaystyle \mathbf {a} (t+\Delta t)}$ using the just calculated ${\displaystyle \mathbf {x} (t+\Delta t)}$ and your force as a function of position
Calculate ${\displaystyle \mathbf {v} (t+\Delta t)=\mathbf {v} \left(t+{\tfrac {1}{2}}\,\Delta t\right)+{\tfrac {1}{2}}\,\mathbf {a} (t+\Delta t)\Delta t}$

I don't think that switching to an RK scheme is going to help the way that you want. A higher order scheme might allow you to take larger timesteps, but it's not going to change the fundamental problem of using a non energy conserving method.

Boundaries

In terms of the boundary, one approach to try is to not explicitly encode the collision detection. Instead, make the boundary a steep potential that gets added to the forces acting on the particle. Then you get walls of all shapes for free and you simplify the code a bunch. With non-symplectic methods, this steepness would cause all kinds of problems, but it should be okay with a symplectic approach.

Conclusion

If you wanted me to take a stab at adding this to the code, I might be able to do so. But I know zero C# or shader languages.

If you do another episode on your fluid simulation, or other particle simulations, there might be some cool visuals that you could create explaining why your integrators behaved the way that they did.

On a final note, I wanted to say that I really love your videos. They have a wonderful mix of gorgeous visualization, whimsy, and exploration.

All the best, ~Max

stndbye commented 8 months ago

I did some reading (got a bit obsessed with the topic, don't know why).

I found one well explained method to have a nice viscosity that can reproduce shocks. The paper also says that energy conserving integrators work best, they employ a predictor/corrector leapfrog (thought that's 2 different methods but that's how the paper calls it - it also preserves orbits better).

To make some room for the performance drop this calculation causes (per particle smoothing lengths, and other additional params evolved with the simulation), the divergent for loops in the kernels could perhaps be replaced with threads that work on a point pair exclusively. Either using dynamic parallelism, or by creating a single huge list of interaction pairs, and run a single kernel on all pairs. Much less kernel launches, but the physics kernel would have to wait until all pairs are produced.

Hashing by morton code could help eliminate some of the unnecessary distance checks, and reduce the problems arising because of the changing smoothing lengths vs cell sizes. Found some routines with exact range queries for the z curve (litmax, bigmin), but those don't seem to like GPU style parallelism.

The per particle smoothing length should limit the number of neighbours closer to the ideal resolution dictated by the equations (h is a function of signal/sound speed and density - ss is also evolved). So maybe the current implementation would be better because every particle will have a similar execution path and mem access. Wish i had enough time to try all the options..

The physics: https://users.monash.edu.au/~dprice/ndspmhd/ There is a nice paper plus fortran source and a lecture series on yt. Works with both dust and gas so should be enough for disc evolution like in galaxies or star systems. Also has MHD, which sounds intriguing.

DP: https://developer.nvidia.com/blog/introduction-cuda-dynamic-parallelism/

stndbye commented 7 months ago

Just wanted to come back to the tree topic. All the code and papers i found looked terrible - very high complexity if one factors in "proper" gpu usage (shared mem conflicts, divergence and whatnot). And on top of that they rely on binary search (usually using powers of 2 to make bank conflicts even more likely) and other highly divergent and disgusting stuff.

Finally i checked CUB library, and these reduction / group aggregates i was afraid of are blazing fast. On 18m points, the min index per cell and center of mass per cell reductions cost about 0.08 and 0.1ms. I can just reduce for each 3-bit once and have the full tree in about1.5ms.. ReduceByKey makes bottom up oct/quad tree building sound trivial. And these functions can also be launched from within a kernel using DP without much hassle.

Sorting is slow, 16ms - but it sorts the data array as well, not just the keys. I expect that to help a lot with mem access speed as it should be more aligned due to the zcurve.

I whish there was an easy way to keep unity. Managing the visualization and UI from plain c++ code is way more painful. Didn't find anything good on how to run CUDA code from unity. Thrust could also be nice, i think that works on all cards and uses CUB if possible.

Anyway, just wanted to drop this here in case it helps you in a future project. And hello trees, here i come :D

SebLague / Fluid-Sim