StreamHPC / gromacs

OpenCL porting of the GROMACS molecular simulation toolkit
http://www.gromacs.org
Other
25 stars 4 forks source link

Generate files with input/output data for running the CUDA kernel #43

Closed ancahamuraru closed 9 years ago

ancahamuraru commented 9 years ago

These results are to be used as a reference when testing the OpenCL kernel. They should include the input data and output data at several stages (see #41 #42) for nbnxn_kernel_EleCut_VdwLJ_VF_prune_opencl, the first kernel that gets launched when using d.poly-ch2 from archive gmxbench-3.0.tar.gz downloaded from http://www.mmnt.net/db/0/0/ftp.gromacs.org/pub/benchmarks

ancahamuraru commented 9 years ago

Results after the first run of nbnxn_kernel_EleCut_VdwLJ_VF_prune_cuda:

e_lj is not always the same probably due to atomic operations.

The same results are obtained after the first run of nbnxn_kernel_EleCut_VdwLJ_VF_prune_opencl on NVIDIA GTX 660M:

ancahamuraru commented 9 years ago

Results for f, fshift after the first run of nbnxn_kernel_EleCut_VdwLJ_VF_prune_cuda can be donwloaded from here: https://github.com/AncaSC/GromacsTesting/archive/master.zip

Uncomment

define DEBUG_DUMP_F_OCL

define DEBUG_DUMP_FSHIFT_OCL

to enable f, fshift logging to file for the OpenCL implementation.

If the CUDA results files are placed in the current folder, a comparison is also performed between the OpenCL and the CUDA results.

Have a look at float cmp_eps and adjust its value as needed during the tests.

ancahamuraru commented 9 years ago

OpenCL - CUDA comparison results for f, fshift on NVIDIA GTX 660M

ancahamuraru commented 9 years ago

Results for cj4 after the first run of nbnxn_kernel_EleCut_VdwLJ_VF_prune_cuda can be donwloaded from here: https://github.com/AncaSC/GromacsTesting/blob/master/cuda_cj4.txt

Uncomment #define DEBUG_DUMP_CJ4_OCL to enable cj4 logging to file for the OpenCL implementation.

If the CUDA results files are placed in the current folder, a comparison is also performed between the OpenCL and the CUDA results.

ancahamuraru commented 9 years ago

OpenCL - CUDA comparison results for cj4 on NVIDIA GTX 660M: 8400 differences for imask fields

sharpneli commented 9 years ago

Total number of differences from the cuda cj4 result using AMD R7 260M: 16811

ocl_f and ocl_fshift are full of zeroes so everything differs from the cuda one.

dkarkoulis commented 9 years ago

In fact in the case of AMD it seems the main kernel crashes silently :(. So the results are not meaningful.

VincentSC commented 9 years ago

Please add as many AMD related OpenCL bugs as possible to https://github.com/StreamComputing/AMD_OpenCL_bugs Op 11 okt. 2014 14:53 schreef "Dimitris Karkoulis" < notifications@github.com>:

In fact in the case of AMD it seems the main kernel crashes silently :(. So the results are not meaningful.

— Reply to this email directly or view it on GitHub https://github.com/StreamComputing/gromacs/issues/43#issuecomment-58748540 .

ancahamuraru commented 9 years ago

Check the following #defines for how to generate files with output from CUDA kernels: DEBUG_RUN_STEP DEBUG_DUMP_FSHIFT_CUDA DEBUG_DUMP_F_CUDA DEBUG_DUMP_CJ4_CUDA DEBUG_CUDA