Closed kosmitive closed 9 months ago
UpdateHistory
not "Backlog"ValuationResult.__add__()
turned out not to be commutative (a bug), then I don't think it would be enough to save the order in which the futures are completed because your log only operates on 1 iteration of the outer loop. Note that, in addition, the list of completed futures in one iteration will typically have length 1, making the backlog superfluous most of the timecompleted=[a,b]
, then done(result.update(a)) == True
but done(result.update(b)) == False
About the implementation:
Backlog
is just a queue, no need for a new classadd
which has horrible time complexity (although admittedly this would never be a problem with the current design since the log will never be more than a couple entries.However, I don't think that this approach is going to be very fruitful
ValuationResult.__add__
is not commutative. What you observed is not necessarily what you diagnosed. I insist that there are other ways in which execution order affects the result. For those, if one wants full reproducibility a history which is local to one iteration of the loop is not enough, as explained. You would need to ensure that results are parsed in the exact same order, across the whole run. A tree is not going to make the situation any better and we most certainly don't want any added complexity! The only way to achieve such a thing would be to pass a counter with the input to each worker and returning this counter with the results. The main process would then pool all results and only consider them in the sequence they were sent. I am not sure we would want this, even as an option.After thinking a bit about this, I am not so sure that this would be that useful. The benefit of perfect reproducibility is not really that great. After all it's repeatability that matters (i.e. the ability to obtain the same results with a fresh experiment by someone else, where "same" doesn't mean up to the last decimal digit). Also keep in mind that different RNGs, different architectures, different optimisers, and so on will all produce slightly different results, so a full trace will never get us to 100% reproducibility (which again, is not that useful).
Also, if the variability that you observe is able to shift the boundaries of a confidence interval enough to matter, something is probably amiss. Then again, if this is not the case, and for some reason order of arrival does matter in a meaningful way, I guess we can add this
@kosmitive Any thoughts? I think we should close this and the associated PR.
Okay let's close this one. I agree for reproducibility it is not really required. If we need it later we can reopen it.
Parallel processing can result in race conditions different orders of execution. This hinders reproducibility. In order to make our code fully reproducible, we need some kind of log. Here is a class we might be able to use:
It can be applied to an executor as follows: