gandalfcode / gandalf

GANDALF (Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids)
GNU General Public License v2.0
44 stars 12 forks source link

Stock pruned #100

Closed giovanni-rosotti closed 7 years ago

giovanni-rosotti commented 7 years ago

This is not finished so it is too early to merge; it seems to work but I haven't tested it extensively. And it's implemented only for SPH at the moment. But I need to discuss a few things, see below.

In the SPH loop, I've put the stocking of the pruned trees just before we look for which particles to export. We don't use the pruned trees before this, so there is no need to stock the tree before the h calculation I believe. But we do need to update hmax in the main tree before we stock the pruned tree, or the pruned trees will have the wrong ones. Am I right? This means that I should remove the update h max from the compute force routines if using MPI. If we are on a tree rebuild step, all of this is a bit overkill because there is no need to fully restock the pruned tree. We could actually build the pruned trees after the h calculation, in the same way as we stock them there, because they are not needed before!

I also restock the pruned tree before the load balancing - currently we were rebuilding it which is unnecessary. But there's a problem: we are not restocking the local tree between moving the particles and doing the load balancing. So what is the point in rebuilding/restocking the pruned trees? It's just going to be the same because it's a copy of the local tree. I think we either need to stock both of them, or none; but only one does not make sense. I can't decide if it's worth to do the stocking - not doing it just means that the load balancing is suboptimal, but it's not terribly wrong.

Another thing was the UpdateBoundingBoxes call - I was not sure wheter it's needed in a non-rebuild step - for now I've added it. I'm not sure where we actually use bounding boxes, which would be good to know...

rbooth200 commented 7 years ago

I agree with most of this. My argyment to why is as follows:

But we do need to update hmax in the main tree before we stock the pruned tree, or the pruned trees will have the wrong ones.

Surely the correct place to call update all hmax is immediately after the updating h, i.e. Once the densities have been computed? That way it's always valid. I don't buy any arguments about not doing updating hmax to save work since we shouldn't be computing h if we don't need it.

But there's a problem: we are not restocking the local tree between moving the particles and doing the load balancing.

I think this is another case in point: The correct time to rebuild/restock the tree is immediately after the particles have been 'moved' (this also means masses updated in the meshless etc) since this is when the tree has been invalidiated. I appreciate that other things will need to trigger a tree-rebuild (e.g. sinks & load balancing), but I think the safest thing to do is a rebuild after moving particles as default, and other rebuilds where necessary.

giovanni-rosotti commented 7 years ago

Surely the correct place to call update all hmax is immediately after the updating h, i.e. Once the densities have been computed? That way it's always valid. I don't buy any arguments about not doing updating hmax to save work since we shouldn't be computing h if we don't need it.

This is actually a very good point for the main tree, but it does not apply to the ghost and MPI ghost tree. At least, not for SPH - they do need to be updated in the meshless. I'll see how to do something general for both hydro schemes.

I think this is another case in point: The correct time to rebuild/restock the tree is immediately after the particles have been 'moved' (this also means masses updated in the meshless etc) since this is when the tree has been invalidiated. I appreciate that other things will need to trigger a tree-rebuild (e.g. sinks & load balancing), but I think the safest thing to do is a rebuild after moving particles as default, and other rebuilds where necessary.

Here I am not sure I agree. Load balancing is done soon after the particles have moved, so it's a bit pointless to move the particles, rebuild the tree, do load balancing, and then rebuild the tree again... I think that what we have now is probably the best solution already. The problem is just that we are doing it with pruned trees that are one step out of date, because the main tree is. Do we care?

giovanni-rosotti commented 7 years ago

Another update: when rebuilding the tree very infrequently with periodic boundaries, we might be getting wrong results. This is for two reasons:

a) in SPH, we check if a particle has crossed the boundary only on a tree rebuild step. This is different than the meshless, which checks at every step. The ghost finding routines should still work correctly in the SPH case, provided that the particles has not moved more than half of the size of the box. But we have no way to guarantee that this is the case; we should have a way of forcing a rebuild if the particles have moved too much. Or just don't bother and wrap them at every step like in the meshless. b) we only look for new ghosts when rebuilding the tree. So if we have not rebuilt the tree for long enough we will be missing some ghosts. Again we should have a way of forcing a tree rebuild, or at least forcing to look for new ghosts.

I don't know if we should care about optimizing periodic simulations. After all, how many people are using them in production? Perhaps we should not bother, and just go for the more conservative but more expensive approach.

giovanni-rosotti commented 7 years ago

Thinking about it, issue b) affects also MPI ghosts, even without periodic boundaries. So we need either a function that checks if we need to create other MPI ghosts, or we should just look afresh for MPI ghosts at every step, but using a stocked local tree (rather than a fully rebuilt one). To be honest, it is not clear to me that the cost of the first function is going to be smaller than the second one. Thoughts?

rbooth200 commented 7 years ago

My thoughts on b)

I think it should be possible to get whether the ghosts are consistent during a tree walk. All we need to do is ensure that the hbox of the particles we are doing the tree walk for does not overlap regions of space that we don't have particles for. We know when this is the case if we find box overlap, but the cell is not a leaf cell, so that should signify a problem.

There are a few difference places we could do this. The most optimal would be 'on the fly' i.e when we walk the ghost tree / pruned tree. We would then throw an exception, which could be caught and iterated on. Here I mean a real exception, rather than the exception handler.

I suspect that the easiest solution would be something more proactive, which is likely going to be more expensive or less reliable. A sensible pro-active option that would enforce consistency would be to check during CopyDataToGhosts, but this option might be overly expensive in tree-walks.

Perhaps the best place to check is actually when updating hmax in the tree. This function could flag when if a new ghost-tree rebuild is needed. The argument as to why update hmax is a good places follows from my point above: we should be using it to ensure that the local tree is valid - it makes sense to use this to automatically flag if we need to update the ghosts.

Note that if we do detect an error this way, it probably means we need to re-compute the denisty. A sensible option would be to have a two-fold check. One to check whether the tree is consistent, if not then iterate. But we could probably avoid essentially all iterations by also checking the safety factors at the same-time. E.g. we must have 1 hrange set of particles to be correct, we should probably rebuild the tree if we only have 1.2 hrange particles, and when rebuilding we ensure that we have 1.5-2 hrange particles.

a) This one is simpler. I don't mind if we do periodic box wrapping more frequently since it's a cheap function, but I think we should do it the same way in SPH and the meshless. We could talk about optimizing this for the case when we have only a small number of active particles (e.g. by only doing it for active particles and using safer periodicity checks in the code, e.g. while loops), but I think that we should that if/when it comes to looking at that problem

giovanni-rosotti commented 7 years ago

I'm not sure I completely follow your reasoning, but to me recomputing the density sounds potentially quite expensive... whereas I like the 1.2 hrange of particles (although I have no idea of how to do it in practice). But let's remember that updating the existing MPI ghosts is only slightly faster than creating them anew - the communication is what dominates the cost, and that is the same. The advantage could be bigger with the periodic ghosts instead, but do we want to optimize this case?

The issue with a) is not the cost of the function itself, but that the wrapping makes the tree far less efficient (some cells will be as wide as the domain, so you need to open until the leaf cells before knowing the real extent of a segment of the tree). I do agree we should do it like in the meshless, and I still think this is the best solution at the moment, because again I don't think we should optimize for this case.

So, I don't know, but given the long to-do-list we have, my position would be for now to recreate periodic and MPI ghosts every step to be on the safe side, and move on. As I said, I don't care about the periodic case, and for MPI the difference would be tiny anyway. And there is more to be gained there from optimizing the communication. Premature optimization is the root of all evil... but I'd like to know if you strongly want to optimally solve this problem now.

rbooth200 commented 7 years ago

I agree that there is no point in optimizing the periodic ghosts at the moment, they are not used so much and we might yet get rid of them. With the MPI ghosts, if most of the cost is in the communication then I think the correct thing to do would be to rebuild the MPI ghost tree each time-step. I'd optimize this to include only those particles that are needed i.e. the particles found by scatter-gather walks of active cells. We'd end up communication an awful lot less information each time-step.

rbooth200 commented 7 years ago

I don't think we need an optimized solution right now. We should test and see, but I think that the cost could become significant if we only have a few active particles.

giovanni-rosotti commented 7 years ago

I think you are right about only walking for active cells. The issue is that we would need to use the pruned trees for that, but at the moment SearchMpiGhostParticles only uses the bounding box of the domain. I'll create an issue for that.

giovanni-rosotti commented 7 years ago

I think I have now fixed everything we discussed, including the meshless. It would be ready to be merged, but there are still problems. When not rebuilding the tree at every step, non periodic simulations are fine (with or without MPI). Periodic simulations are instead still clearly wrong (even without MPI), despite the fact that I now look for periodic ghosts at every timestep. I fail to understand how it is possible... Everything still looks fine if Ntreebuildstep=1.

giovanni-rosotti commented 7 years ago

See #102 , let's remember to fix the meshless when dead particles have been removed (need to look for ghosts again)

giovanni-rosotti commented 7 years ago

Ok, finally went to the bottom of this. Turns out the problem is not in the periodic ghosts used by compute h, but in the ComputeNeighbourAndGhostList function used to compute forces that creates ghosts on the fly. The periodic box overlap part is fine, and so I incorrectly assumed that the function was working. But it's only at the end of the function that we actually create the ghosts on the fly, using the GhostFinder object. The issue is that this is working using the center of the cell, but we now have cells as wide as the domain; they contain particles on both sides of the periodic boundary. Therefore the GhostFinder almost randomly decides on which side to create the ghost depending on how exactly the centre of the cell compares with the position of the midpoint of the domain box.

I think that the best thing to do would be to actually create 2 replica of the particle, one for each side. We already store the hbox of the cell, so we can just use that instead of rcentre to compute the nearest periodic vector. If I am not mistaken this problem can happen only when hbox is bigger than half the size of the domain, so we can continue to return only one ghost in the standard case. Thoughts? Other solutions require to loop over the particles in the active cell, which is unpleasant, inelegant and expensive...

rbooth200 commented 7 years ago

I'm quite sure which calls to this function you are worrying about, but I think I get the gist. Is it in the ghost-tree calculation routines? I thought they didn't use the GhostNeighbour finder, but used the ghost tree instead. Why do we have cells as large as the domain otherwise?

Ok, so this is known issue with the periodic ghosts, i.e. that the nearest distance to the cell centre is not necessarily the nearest distance to the particles within that cell. I believe the periodic box overlap checks are fine[1], so we should be creating 'the right number' of ghosts, but possibly placing them in the wrong place as far as particles are concerned.

I have concerns about generating multiple copies of periodic ghosts within the tree-walk when it comes to force calculations: it would conflict with the periodic gravity calculation, which requires just a single copy of each particle. However, if you can argue why there is a good reason for creating both copies then I think it can be done fairly simply:

Can this be done in a way that is guarenteed not to upset the gravity calculation? If not, and the problem is arising in a single place (e.g. the pruned tree build) perhaps this can be solved by correcting the positions of the particles after the we've generated them. This is the solution we currently take with the periodic gravity, since combined with the criterion that h is not too big we get correctness without polluting the rest of the code.

[1] Though feel free to dispute this, since its one of the more recent and less well tested bits of code.

giovanni-rosotti commented 7 years ago

Ok, I have to see if the solution for periodic gravity can work also for hydro then. But just to clarify, this has nothing to do with the pruned tree. This is an issue in the main tree! If you just run a shock tube with periodic boundaries, the result is wrong if you don't rebuild the tree at every step (because when a particle wraps the cells become as large as the domain). What goes wrong is the force calculation. The h calculation, which uses real ghosts (ok, it's a horrible term, but you know what I mean...), is fine.

The offending function is ComputeNeighbourAndGhostList in Tree.cpp, which does use a ghost finder. The cell we are speaking of is not the one in the ghost tree (this function is not using the ghost tree at all); it's the one we are finding the neighbours for. It depends on how you want to say it, but the number of ghosts is NOT right; specifically, we ARE finding the right particles that need to become ghosts, but NOT creating enough replicas of them. So yes, the periodic overlap is fine as I already said. The problem is that the particle should interact two times with the cell, one on each side. It is not interacting twice with any given particle of course (unless there are particles with huge h, but we are already checking for that), so as long as you have just pure hydro there are no issues with having two copies of the same particle. The solution you propose (correcting the coordinates after having generated the particles) cannot be done in the tree itself by construction, and it would need to be done in GradhSphTree::UpdateAllSphHydroForces (or equivalent). It can be done, but I find it unpleasant because a) it duplicates code b) it shifts this responsibility from the tree (which is the object that should deal with this problem) to the caller. I'll see what I can do...

rbooth200 commented 7 years ago

Ok.

Can we wrap particle positions on tree rebuild steps only, or would this cause other issues? Otherwise we will can wrap particles when we cull the neighbour list in the force calculation. Perhaps the best solution is the latter since it enforces correctness.

If we're worried about optimization when checking every pair then we should avoid doing this if the simulation is not periodic. In this case the compiler would probably benefit from having a local, const copy of a flag, which should help the comiler optimize the check for periodicity.

Also, perhaps nearest periodic vector should be a member function of domain box? Domain box knows about it's periodicity after all. As above, we might benefit from making a local copy of domain box since we will be checking it's size so many times (Ghost Neighbour Finder should do the same thing).

rbooth200 commented 7 years ago

Can we wrap particle positions on tree rebuild steps only, or would this cause other issues?

With further thought I'd think that we should avoid wrapping particles between tree rebuilds anyway, regardless of whether we implement the other fix. My reasoning for this is that the cells centre of mass is very wrong after the remapping, which is not great for problems with periodic gravity. Perhaps it's not possible to do robustly though

rbooth200 commented 7 years ago

Actually, scratch that: the cells with wrapped particles will be opened anyway, so it accuracy shouldn't be a problem. However, we will always be forced to open the cell and all its associated parents which is also far from optimal.