Closed edbennett closed 2 years ago
This is partially due to one data point changing when we regenerated the data (see #5), but also seemingly due to small shifts in the central values as different bootstrap samples are selected. Fixing the seed will prevent this, but doesn't solve the underlying issue that the answers we get shouldn't be dependent on a choice of random seed.
@dvadacchino Did we decide what to do about this? The analysis now runs end-to-end and in parallel, so in principle we could just multiple the number of bootstrap samples by ten and run it on a node of SUNBIRD? I think it should finish within an hour even then (unless there's some code somewhere that's accidentally quadratic in the number of bootstrap samples that I haven't spotted). Or, we can go with the "use the mean of the original data rather than of the samples to get a more reliable central value" option, but I'm not sure how much modification to the code that would need (perhaps you have a better idea)
The largest change here was due to an error in the uncertainties for Nc=4, beta=7.80, which was fixed by regenerating the data.
There is still some residual fluctuation in the final data (chi-square on fits changes by ~0.2), despite fixing the random seed so in principle having the same random numbers throughout.
Fixed in 5aef00870fca6fbbb457d7545d50c54ee8b2f73e and 89cef7fa1d322da79aa3b16743865ecb4031c786
Two issues:
hash()
is not guaranteed to give the same hash of a string between different runs of Python, so in fact the seed was changing each runproduce_bs_samples.py
was not the only place where bootstrap samples were being created; there were other sources of randomnessBoth of these were replace with an MD5 from hashlib
, which does give bitwise identical reproducibility (when run on the same machine at least), with the exception of timestamps on PDF files.
The current output for figure 11 has some points differing, including the continuum limit, and some quite different values for the chi-square. Worth understanding if this is just a small instability in the bootstrap sampling, or if it is something more serious.