Closed nickhand closed 7 years ago
Does domain decomposition make it slower? I prefer keeping the code formally parallel since there are non-trivial thoughts that goes into it.
We can come up with finer domain decomposition schemes. For example, we can reassign ranks to spatial regions differently and skip empty regions.
Yes, I agree that's probably what we should do. That functionality doesn't exist right? I am not that familiar with all of the features of the GridND object, etc.
And yep, the code is significantly slower in this case b/c the distribution of ranks is quite bad. In the pair count algorithm for (ra, dec, z) input, we decompose the position array that's already on the rank (with smoothing=0) and then correlate against the decomposed position (smoothed by r-max). The problem is the number density is very non-constant in the Cartesian box because of the (ra, dec,z) --> (x,y,z) transformation so you end up with a lot of ranks doing very little (or no) work, while one or two ranks do most of the work.
I think we just need to ensure that the number of objects on each rank after that first decomposition (smoothing=0) is relatively even still.
Initial tests seem to show that rainwoodman/pmesh#26 fixes this, although you do need to over-decompose the domain grid in order for the load balance to do anything.
Some implementation questions for @rainwoodman. In the past, I think the design we have been using is:
Is step 1 necessary? The pos1 array is already evenly distributed before step 1, and may not be after the exchange. I guess I am wondering why any particles are exchanged at all when the smoothing = 0?
It is evenly distributed but it is not spatially tight -- then it is impossible to ensure the volume in 2 encloses all particles residing on the current process.
So I think you'll have to do the loadbalance with 1, then use the domain assignment of step 1 for step 2.
When you convert (ra, dec, z) to Cartesian coordinates, the particles aren't uniformly distributed in the resulting box. The consequence is that when you do the normal grid domain decomposition in
SurveyDataPairCount
, a lot of ranks have zero or very few particles and the load is unbalanced and a few ranks are responsible for most of the work.@rainwoodman do you see a potential fix for this, or maybe we just want to remove the domain decomposition?