UoB-HPC / BabelStream

STREAM, for lots of devices written in many programming models
Other
325 stars 112 forks source link

Dot verification fails with single precision #20

Open jrprice opened 7 years ago

jrprice commented 7 years ago

We probably just need to increase the tolerance. The error will also be proportional to the size of the arrays (unlike with the other kernels), so we need to make sure whatever error checking tolerance we use is robust enough to avoid these sorts of false positives for any sort of input.

Validation failed on sum. Error 0.000209808
Sum was 39.7910385131836 but should be 39.7912483215332
tomdeakin commented 7 years ago

We currently check that the sum array is within 1.0E-8 of the expected value for doubles and floats. We could either:

  1. Use 1.0E-5 for for floats and 1.0E-8 for doubles
  2. Factor in the array size somehow

Option 1 is simple but might hide errors. If the arrays contain correct values, then as long as the reduction is close for this benchmark it might be suitable. Option 2 might be hard to quantify how we bias the array size.

Srinivasuluch commented 7 years ago

Would it be possible to make the "sum" has double datatype irrespective of input args, "double or float" so that it gives accurate results. The "sum" is an user data type used for comparison with "glodSum" value so it should not be matter. I mean, change the "Template sum" to "double sum"

tomdeakin commented 7 years ago

For devices which do not support double precision would this not pose a problem?

zyzzyxdonta commented 4 years ago

Hi, is this issue still being worked on?

tomdeakin commented 4 years ago

Yes, but we've not come up with a satisfactory solution yet.

zyzzyxdonta commented 4 years ago

Thanks for your reply. Am I right in assuming that despite the verification failing, my measurements are still valid?

tomdeakin commented 4 years ago

If it's just the reduction that fails (dot), and the other kernels are OK then the contents of the arrays should be correct. If the result is close enough on inspection but fails because of the tolerance then it's probably fine. If the result is 0.0 or some other nonsense number then it might have done something really wrong...

zyzzyxdonta commented 4 years ago

Alright, thanks a lot!

tomdeakin commented 3 years ago

@zjin-lcf suggested using different tolerances for the reduction result based on the data type (option 1 above).

tomdeakin commented 6 months ago

Whilst reviewing #186 we discussed the fact that the goldSum value is computed numerically exactly using multiplication rather than repeated addition. I wonder if there is a way to get error bounds on the difference in algorithm from the FP rules.