NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

nvc++ -stdpar with a cuda module produces erroneous output #31

Closed benkirk closed 1 year ago

benkirk commented 1 year ago

Capturing here so future me doesn't forget...

When using nvc++ -stdpar for GPU offload, an external cuda module breaks things:

nvhpc only, correct:

$ wget https://raw.githubusercontent.com/benkirk/paradigms_playground/master/parallel_stl_sort.C
$ module purge && module load nvhpc && module list
Currently Loaded Modules:
  1) ncarenv/22.10 (S)   2) craype/2.7.17 (S)   3) nvhpc/22.7

$ nvc++ -stdpar -o parallel_stl_sort parallel_stl_sort.C && ./parallel_stl_sort 
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::seq: 43.127 sec. 
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...

input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::par: 0.632754 sec. 
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...

input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::par_unseq: 0.073256 sec. 
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...

nvhpc+cuda, incorrect:

$ wget https://raw.githubusercontent.com/benkirk/paradigms_playground/master/parallel_stl_sort.C
$ module purge && module load nvhpc cuda && module list
Currently Loaded Modules:
  1) ncarenv/22.10 (S)   2) craype/2.7.17 (S)   3) nvhpc/22.7   4) cuda/11.4.4

$ nvc++ -stdpar -o parallel_stl_sort parallel_stl_sort.C && ./parallel_stl_sort 
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::seq: 42.8227 sec. 
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...

input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 0 0 0 0 0 0 0 0 0 0 0 ...
after unique: v.size()=59328; 765600696 352808383 3641236997 4016398694 1279020192 465826551 864301009 822663315 3257882672 1989160727 4086747794 ...
==> ERROR: size mismatch from serial algorithm!
std::copy() / std::sort() / std::unique() / std::execution::par: 0.61413 sec. 
final: v.size()=59328; 765600696 352808383 3641236997 4016398694 1279020192 465826551 864301009 822663315 3257882672 1989160727 4086747794 ...

input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 0 0 0 0 0 0 0 0 0 0 0 ...
after unique: v.size()=643648; 2435803498 2970823809 3485536073 2755831796 3868881694 2623710790 2458607871 3552076208 607421919 2528345273 2013025721 ...
==> ERROR: size mismatch from serial algorithm!
std::copy() / std::sort() / std::unique() / std::execution::par_unseq: 0.606054 sec. 
final: v.size()=643648; 2435803498 2970823809 3485536073 2755831796 3868881694 2623710790 2458607871 3552076208 607421919 2528345273 2013025721 ...
benkirk commented 1 year ago

Created an Nvidia forum issue: https://forums.developer.nvidia.com/t/nvc-external-cuda-thrust-conflicts-for-stdpar-offload/235624

benkirk commented 1 year ago

Unsatisfying workaround in place, in any case this is now reported upstream. Closing.