Reservoir update potential issue

RaptorMai / online-continual-learning

A collection of online continual learning paper implementations and tricks for computer vision in PyTorch, including our ASER(AAAI-21), SCR(CVPR21-W) and an online continual learning survey (Neurocomputing).

382 stars 57 forks source link

Reservoir update potential issue #13

Open hsilva664 opened 2 years ago

hsilva664 commented 2 years ago

As I was reading the code, I noticed that in the file utils/buffer/reservoir_update.py, in case the IF statement on line 23 fails (but not the one in line 13), some examples will be added to the last places in the buffer, whereas the rest of the input batch examples may be or may not be added depending on the sampling. However, I believe the RETURN statements in lines 44 and 61 are supposed to return indexes of all examples that were just added, but instead just return the examples added in the "sampling" part of the code. Should the examples added to the last positions of the buffer (i.e. examples that can be calculated the same way that the variable in line 24 is calculated) also be returned?

czjghost commented 2 years ago

As far as I can see, there is no problem in reservoir_update.py, the method author has implemented is the batch-level operation for reservoir sampling, not the sample-level operation for reservoir sampling... According to the algorithm, you could find that if buffer is not full, all samples of the current batch will be saved, otherwise, each sample of current batch will have similar probability to be selected into buffer. So it is impossible to select all samples into buffer all the time! More details to see "On Tiny Episodic Memories in Continual Learning".

hsilva664 commented 2 years ago

I think maybe you misunderstood what I meant, the error (if it is indeed an error) is more "code-level" than "algorithm-level" as in your reply. The adding/updating of examples is correct, but the returned values are inconsistent depending on what happens.

Namely, the output of this function should be the indexes that were just added. This is what gets returned if the buffer is not full and is also what gets returned after the buffer is full. However, in the specific iteration that fills up the buffer, only some of the indexes that were just added are returned. Check the specific lines of my previous message to see this.