edf-hpc / verrou

floating-point errors checker
https://edf-hpc.github.io/verrou/vr-manual.html
GNU General Public License v2.0
49 stars 13 forks source link

Improving reproducibility in verrou_dd #16

Open HadrienG2 opened 6 years ago

HadrienG2 commented 6 years ago

Debugging rare failures with verrou_dd can be very difficult. Either you are lucky and the failure can be reproduced in upward/downward rounding mode, or you are in for a long time tuning VERROU_DD_NRUNS, potentially to absurdly high values, without managing to reproduce the failure on every run.

Farthest does not always help because its result not stable under delta-debugging (as it depends on previous rounding decisions), and I suspect that this also holds for toward_zero. Similarly, vr-seed does not help reprodubility in delta-debugging mode, because it only reproduces the sequence of roundings that will be applied, and not the places where they will be applied.

I think this could be improved, admittedly at the cost of large overhead (which may prove intractable in practice), by recording the sequence of rounding modes that was applied on each source file location, and reproducing that instead of just the global random number sequence.

wkirschenmann commented 4 years ago

Maybe another approach would be to store the RNG state while entering each block. Still big but probably more manageable. Another solution would be to rely on hellgrind to ensure a deterministic order of blocs executions.