Incorrect results for GPU transforms

pgrete commented 6 years ago

I'm currently performing scaling tests of the library and I get incorrect results for the provided test programs on both Pleiades and Titan when using GPUs, e.g. (on Titan)

$ aprun -n 64 ./step1 512 512 512
Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 8 , c_dims_1 = 8

Error is 5.2362e-11
Relative Error is 1.39031e-16

Results are CORRECT!

Timing for FFT of size 512*512*512
Setup   4.36814
FFT     0.252446
IFFT    0.279566

$ aprun -n 4 -N 1 ./step1_gpu 512 512 512
Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 2 , c_dims_1 = 2

L1 Error is 4.57168e+06
Relative L1 Error is 12.1387

L1 Error of iFF(a)-a: 4.57168e+06
Relative L1 Error of iFF(a)-a: 12.1387
GPU Timing for FFT of size 512*512*512
Setup   1.89571
FFT     0.382671
IFFT    1.12519

This is how I installed accfft (on Titan):

module swap PrgEnv-pgi PrgEnv-gnu
module load cudatoolkit
module load cray-fftw
module load cmake
cd src
git clone https://github.com/amirgholami/accfft.git 
cd accfft
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/src/accfft/build \
-DFFTW_ROOT=/opt/cray/fftw/3.3.6.2/interlagos \
-DFFTW_USE_STATIC_LIBS=true \
-DBUILD_GPU=true \
-DBUILD_STEPS=true \
-DCXX_FLAGS="-O3" \
-DBUILD_SHARED=false \
..
make

Am I doing something wrong or is there a bug/misconfiguration? Please let me know if there's any other information required.

Thanks,

Philipp

pkestene commented 6 years ago

Hi Philipp,

it should work. I can run step1_gpu with the following output (using 8 P100 GPU):

L1 Error is 1.24047e-09 Relative L1 Error is 3.29369e-15

Results are CORRECT!

L1 Error of iFF(a)-a: 1.34574e-09 Relative L1 Error of iFF(a)-a: 3.5732e-15

Results are CORRECT!

GPU Timing for FFT of size 512512512 Setup 1.48673 FFT 0.0720751 IFFT 0.0738944

Have you tried other configurations ? e.g.

a smaller domain: 128^3 or 256^3
only 1 GPU
can you cross-check that each mpi task of a given node is mapped to a different GPU.

Pierre.

On Tue, Apr 24, 2018 at 10:07 PM, Philipp Grete notifications@github.com wrote:

I'm currently performing scaling tests of the library and I get incorrect results for the provided test programs on both Pleiades and Titan when using GPUs, e.g. (on Titan)

$ aprun -n 64 ./step1 512 512 512 Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 8 , c_dims_1 = 8

Error is 5.2362e-11 Relative Error is 1.39031e-16

Results are CORRECT!

Timing for FFT of size 512512512 Setup 4.36814 FFT 0.252446 IFFT 0.279566

$ aprun -n 4 -N 1 ./step1_gpu 512 512 512 Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 2 , c_dims_1 = 2

L1 Error is 4.57168e+06 Relative L1 Error is 12.1387

L1 Error of iFF(a)-a: 4.57168e+06 Relative L1 Error of iFF(a)-a: 12.1387 GPU Timing for FFT of size 512512512 Setup 1.89571 FFT 0.382671 IFFT 1.12519

This is how I installed accfft (on Titan):

module swap PrgEnv-pgi PrgEnv-gnu module load cudatoolkit module load cray-fftw module load cmake cd src git clone https://github.com/amirgholami/accfft.git cd accfft mkdir build cd build cmake -DCMAKE_INSTALL_PREFIX=$HOME/src/accfft/build \ -DFFTW_ROOT=/opt/cray/fftw/3.3.6.2/interlagos \ -DFFTW_USE_STATIC_LIBS=true \ -DBUILD_GPU=true \ -DBUILD_STEPS=true \ -DCXX_FLAGS="-O3" \ -DBUILD_SHARED=false \ .. make

Am I doing something wrong or is there a bug/misconfiguration? Please let me know if there's any other information required.

Thanks,

Philipp

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amirgholami/accfft/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/AH5JR6m0O8Ty7xyX1naf6H3nA0gqcMdDks5tr4XogaJpZM4TiVP- .

amirgholami commented 6 years ago

I double checked with another cluster and the results are correct. However, Pierre has recently added a pull request which I just merged which might be the reason you were experiencing issues. Could you please pull the latest version and try again?

pgrete commented 5 years ago

I ended up using a different library. Given that Titan is offline now and I have no access to another system with K20x I'm closing the issue.

amirgholami / accfft

Incorrect results for GPU transforms #15