Ionosphere solver threading/performance

We have a high iteration count over many thousand ionosphere grid vertices (called nodes in the code), and each iteration involves several summations of values spanning several orders of magnitude. If we run twice the same setup, plain "random order" threaded summation yields differences of order 1e-5 straight away even with our usage of Real being double, and if we run these two instances further we quickly have "macroscopic" differences in values of interest, e.g. the cross polar cap potential (that is percent-level discrepancies). This was hampering debugging significantly, hence my effort to yield reproducible summations/solutions of the ionosphere.

I came up with the scheme that can be followed with this first summation, https://github.com/fmihpc/vlasiator/blob/dev/sysboundary/ionosphere.cpp#L2458-L2503, where each thread stores the values to be summed in a multisetthereby providing sorted summation order, before a #critical section where each thread inserts into a global set, and then a #single summation over the whole, thereby sorted multiset. In summations with both positive and negative numbers, there's two sets and we sum them separately in opposite order before summing the overall positive and negative sums.

This yielded what I was after, at a significant performance penalty. In my test with 9920 ionosphere nodes, running 20000 iterations, it took 2239 seconds over one day to solve the ionosphere 24 times (once every 4 s), i.e. on average 93 s each call. The solver exits if it converges to 1e-6 but it did only once, typical convergence rates being a few times 1e-6 to a few times 1e-5. Arguably this would also be good enough, so we could shorten by relaxing the tolerance. But the largest factor is definitely this cumbersome summation mechanism.

Any discussion/suggestions welcome. Given the cumbersomeness of this algorithm, having a different implementation to offer a cfg option of "good" vs. "bad" summation quality would make this part of the code even less readable I fear... Would summing vectors ordered by node ID be enough, and pöh about summation order/precision loss?

fmihpc / vlasiator

Ionosphere solver threading/performance #639