NVIDIA / CoMD-CUDA

GPU implementation of classical molecular dynamics proxy application.
Other
29 stars 19 forks source link

Execution errors with more than 4 nodes #9

Open e-ago opened 7 years ago

e-ago commented 7 years ago

I tested on the Wilkes cluster (Tesla K20 GPUs) the CoMD-CUDA implementation using up to 16 nodes, and I got some errors:

8 processes, crash. All the processes on the i direction, -e -i 8 -j 1 -k 1 -x 80 -y 80 -z 80 err_8proc_8x.txt

16 processes, all zeroes, -e -i 4 -j 2 -k 2 -x 40 -y 40 -z 40 err_16proc_size40.txt

two output with : -e -i 4 -j 2 -k 2 -x 80 -y 80 -z 80 return different "Final energy" and both values are wrong (different from the Final energy in the 4 processes run) out1_16proc_4i4j_size80.txt out2_16proc_4i4j_size80.txt

In general, several run with a size < 80 return all zeroes