UoB-HPC / BabelStream

STREAM, for lots of devices written in many programming models
Other
323 stars 110 forks source link

Validation failures in OpenACC variant with GCC and NVHPC #153

Open jhdavis8 opened 1 year ago

jhdavis8 commented 1 year ago

I'm encountering validation failures in BabelStream's OpenACC version on the main branch related to the number of iterations. Specifically, when the number of iterations is less than 723, validation failures appear:

$ acc-stream -n 722
BabelStream
Version: 4.0
Implementation: OpenACC
Running kernels 722 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Validation failed on c[]. Average error 2.3104e-14
Function    MBytes/sec  Min (sec)   Max         Average     
Copy        797592.848  0.00067     0.00069     0.00068     
Mul         792595.514  0.00068     0.00068     0.00068     
Add         831047.225  0.00097     0.00098     0.00097     
Triad       831176.744  0.00097     0.00098     0.00097     
Dot         719506.962  0.00075     0.00077     0.00075

compared to

$ acc-stream -n 723
BabelStream
Version: 4.0
Implementation: OpenACC
Running kernels 723 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Function    MBytes/sec  Min (sec)   Max         Average     
Copy        796974.794  0.00067     0.00069     0.00067     
Mul         791823.981  0.00068     0.00068     0.00068     
Add         830542.399  0.00097     0.00098     0.00097     
Triad       830553.534  0.00097     0.00098     0.00097     
Dot         719081.005  0.00075     0.00077     0.00075

The average error quantity increases with lower numbers of iterations. This exact behavior appears in all the following test environments:

Some possible causes that Tom suggested are synchronisations being skipped somewhere, probably with the memory transfers, or, some bad type punning, or something funny happening with the pointer captures (they're pulled out to local variables because all OpenACC compilers failed to work otherwise).

tomdeakin commented 1 year ago

One more thought: the wording of the wait clause is pretty weird in OpenACC 2.6, so I wonder if this line is missing the wait clause as we copy back to the host. Does adding the clause fix anything?

Note: if it does this will be strange as all the other kernels have the wait clause so I would have expected that all kernels will have finished before the copy back starts...

jhdavis8 commented 1 year ago

I just tried adding the wait clause to that copy back directive. Still seeing the same failures in all the test environments.

tomdeakin commented 1 year ago

Is this related to #17?

tom91136 commented 10 months ago

I can reproduce this on AArch64 CPUs with both GCC and NVHPC, likely the same for x86 as well.