Open yodada opened 4 years ago
My view is that we should reduce increase the tolerance and have a centralized value for that tolerance. Having a centralized value enables us to gauge where we stand in terms of accuracy. Later, when we swap the FPU with hardfloat, we could reduce the tolerance.
That's in cosimulation. But I couldn't figure why we are seeing mismatches in emulation!
Emm ... it was closed automatically by the PR but I think this is still a valid issue.
IMO, the wiliness of floating point means that there is no silver bullet here. The variations between FPUs is actually the least of our worries—different implementations (i.e., algorithms or variations on algorithms) mean that the results can differ arbitrarily for a given input. Without painstakingly analyzing individual algorithms, there's no way to put a bound on how big the difference can be (neither a relative nor an absolute bound).
Just for fun, here's someone else discovering the difficulty of combining PBT and FP.
I think the practical solution, then, is probably just to widen the tolerance until it works. A slightly fancier solution would be to use an average over multiple runs. That is, Hypothesis randomly chooses inputs—just randomly choose inputs many times and take the average. A little statistical analysis could tell you a confidence bound: for example, when we have 99.99% confidence that the average difference will be within a given bound.
Can we create integer tensors instead of FP tensors? If so maybe we can template all kernels so they work for both integer and FP and test both. The integer results should match exactly with x86 assuming no overflow and we can use a loose bound for FP. The integer tests will make sure the over logic of our kernel is sounds.
❓ This issue is meant for a discussion on how should we handle floating point comparison.
A few hypothesis / random tests, like
sum_22
andmean_hypothesis_3d
, may fail. The reason is such:sum
operation, the partial result can become as large as 1000, which creates inaccuracy in the decimals. However, the final result will be small (around 10 or 20).Not sure what is the best way to handle this problem ... We could