Open llaniewski opened 10 months ago
@shkodm @TravisMitchell After some investigation, we should really think if we want to calculate 64bit indexes, as 64bit multiplication on CUDA is around 20x cost of 32bit multiplication.
@shkodm can you check the sizes in your case? nx,ny,nz, but also number of fields?
TCLB currently does not support cases in which overall offset overflows (32bit) integer for:
load_
functions (dynamic and static access of fields) anywhere wherenx*ny*nz*fields
is larger than2^31
pop_
functions (loading fields through densities) anywhere wherenx*ny*nz
is larger than2^31
This can be fixed by appropriate casting in offset calculations in
LatticeAccess
, but care has to be taken to not slow down the performance by making unnecessaryint64_t
operations.Originally posted by @shkodm in https://github.com/CFD-GO/TCLB/issues/496#issuecomment-1894777340 [...] Some things still don't work as expected (also the same on
master
branch). I run on 2 V100 on Bunya, each with 80GB GPUs, my case is large, so I split between 2. I get:Cumulative allocation of 63.GB)
and thenan illegal memory access was encountered in Lattice.hpp at line 279
The error is the same even if try I split between 3 GPUs (40GB each, so plenty of space even if there is some unaccounted memory)