ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.48k stars 281 forks source link

Order of pairs is non uniform between Lattice sets #195

Closed Meerkov closed 4 weeks ago

Meerkov commented 4 weeks ago

I noticed something which is bugging me. The pairs of Q seem to have an inconsistent orientation within each set, and across sets.

Each Q is paired. The first in the pair is the "self" and the second is the direction it pulls from.

In D2Q9: "self" is (1,0), (0,1), (1,1) and (1,-1) It seems like there is a preference for X alignment.

In D3Q15: 3 of 4 diagonals have the Z coordinate is 1, but one has it as -1. 3 of 4 diagonals have the Y coordinate as 1, but one has -1. 3 of 4 diagonals have the X coordinate as 1, but one has -1.

It seems like there should be a preference for indexing in one of the 3 directions.

In D3Q19, among the diagonals. X has four 1s and two 0s. Y has three 1s and two 0 and one -1 Z has two 1s two 0 and two -1s

In D3Q27, again, it seems like the (-1,1,1) is not aligned. X seems to have a preference in each other category, but the value (1,-1,-1) is pulled instead.

Is the order irrelevant? Has it been experimentally validated that this ordering is best?

I'm asking because if the edges were instead ordered in sets of 4 in a 90 degree rotation, it allows for easier code when performing geometric transformations. Currently the operation for saving and loading using the ternary operator here: fhn[i ] = load(fi, index_f(n , t%2ul ? i : i+1u)); could instead be fhn[i ] = load(fi, index_f(n , t%2ul ? i : rotate(i,2)); which has some convenient properties, assuming a simple rotation operation using modulo 4.

ProjectPhysX commented 4 weeks ago

Hi @Meerkov,

this is the nature of the Esoteric-Pull streaming scheme, it loads/stores the DDFs in asymmetric locations, which solves the data-dependencies of the streaming to eliminate the 2nd copy of the grid in VRAM. This asymmetric storage pattern has no effect on the physics of the simulatuion. The location of every pair of opposing DDFs ((1,2), (3,4), ...) on the grid is irrelevant: individual pairs can be off-set by any number of cells in any direction, as long as they are consistently stored in swapped order in the same locations. The variant I have, where one DDF of every pair always is in the center cell, has best possible memory coalescence and is fastest.

I'd argue that i+1u (= DDF in opposite direction of i, see here Figure 1) is easier than rotate(i,2), although through loop unrolling there will not be additional integer computation for either.

Kind regards, Moritz

Meerkov commented 1 week ago

There are 2 design choices mentioned here:

  1. order for convenient 90 degree turns (currently in no particular order)
  2. pull direction such that it always prefers to pull in preference order +X, +Y, +Z. (currently pulls with slight preference from +X, but not consistently).

Since neither impacts the speed of the algorithm, but they have other nice properties which can be used, I see it as a Feature Request, but I understand it would not be accepted as the purpose of these changes is non-obvious. This creates some useful properties for cubed sphere grids.