CFD-GO / TCLB

TCLB - Templated MPI+CUDA/CPU Lattice Boltzmann code
https://tclb.io
GNU General Public License v3.0
173 stars 70 forks source link

Large offset support #499

Open llaniewski opened 6 months ago

llaniewski commented 6 months ago

TCLB currently does not support cases in which overall offset overflows (32bit) integer for:

This can be fixed by appropriate casting in offset calculations in LatticeAccess, but care has to be taken to not slow down the performance by making unnecessary int64_t operations.

Originally posted by @shkodm in https://github.com/CFD-GO/TCLB/issues/496#issuecomment-1894777340 [...] Some things still don't work as expected (also the same on master branch). I run on 2 V100 on Bunya, each with 80GB GPUs, my case is large, so I split between 2. I get: Cumulative allocation of 63.GB) and then an illegal memory access was encountered in Lattice.hpp at line 279

The error is the same even if try I split between 3 GPUs (40GB each, so plenty of space even if there is some unaccounted memory)

llaniewski commented 6 months ago

@shkodm @TravisMitchell After some investigation, we should really think if we want to calculate 64bit indexes, as 64bit multiplication on CUDA is around 20x cost of 32bit multiplication.

@shkodm can you check the sizes in your case? nx,ny,nz, but also number of fields?