Open psychocoderHPC opened 2 weeks ago
I agree with @psychocoderHPC. In theory it could be possible, that the changed kernel do the same calculations like before, but this a really huge work to prove it. Therefore it is much easier to change the kernels like it was before that we are sure, that our benchmarks a comparable with the reference implementation.
Also the init values could be problematic. Who guaranties use, that the out-of-order execution of a CPU dose not skip the addition with 0 or the multiplication with 1 because it necessary. It is possible, that our implementation of the multiplication kernel is actual the same like the copy kernel, because each value is multiplied with 1.
@psychocoderHPC 1. current alpaka sums errors for verification. Your definition of error above, describing current situation, is not correct.
optional
and not one of 5 babelstream standard benchmark kernels. Nobody knows why it is there, it is not run and displayed at the results.
Thanks i mostly fixed. I agree with @psychocoderHPC. In theory it could be possible, that the changed kernel do the same calculations like before, but this a really huge work to prove it. Therefore it is much easier to change the kernels like it was before that we are sure, that our benchmarks a comparable with the reference implementation.
Also the init values could be problematic. Who guaranties use, that the out-of-order execution of a CPU dose not skip the addition with 0 or the multiplication with 1 because it necessary. It is possible, that our implementation of the multiplication kernel is actual the same like the copy kernel, because each value is multiplied with 1.
@SimeonEhrig Since scalar is 2 multiplication is not equal to copy but i understand your point. Thanks.
Your current bablestream implementation in the last release 1.2.0 is broken. With #2299 the behaviour of the benchmark was changed.
Upstream bablestream is commuting the following code:
Our current implementation:
Note that Nstream is in the upstream code optional but we implemented it in the version before #2299
The problem is that you can not use the current implementation to compare the performance against other implementation because we implement the corresponding stream function but with different access orders to arrays which can affect the performance because of caching effects. This should not be a big issue for large arrays. The copy kernel can be removed or wrongly implemented and the verification is still reporting all results are correct. The reason for this behaviour is that copy set b and the next mul operation is overwriting the result. The same issue happened with the add operation. In our implementation array
a
is never changed except in the init.The upstream code is designed so that operations depend on each other, changing the order or making a mistake in the implementation will result in an error in the validation.
WARNING please do not use the bablestream implementation of alpaka release 1.2.0 for comparison against other bablestream implementation!