Open AnesBenmerzoug opened 5 months ago
@kosmitive could you have a look at this if you have time?
Yes, it might be related to the parallelization as due to parallel processing, the order in which numbers arrive might be prone to a racing condition. I recall this occurs only for semivalues as we break down the calculation to single marginals. Or do you think it might be a different problem?
For the desribed problem, we could introduce a order resolver on the main thread, but at the cost of blowing up RAM on average of about N/2*C
where N is the number of processes and C cost per process.
@AnesBenmerzoug Here is the reference which was made in the tests https://github.com/aai-institute/pyDVL/blob/7c003beeed00416f6f03dd9b3cd4be7a20339d25/tests/value/test_semivalues.py#L228-L229. Do we want to go for a order resolution object for the batches?
@kosmitive I changed that test to use a deterministic scoring method coming from a toy game, so the order of batches shouldn't have an effect on the final result.
Potentially resolved by #558
While working on PR #341, I realized that there is a bug in the batching feature of semivalues when using
n_jobs
> 1. The results are almost the same but not exactly the same.