Gdumb without augmentations

pclucas14 commented 3 years ago

Hi,

congratulations on getting the paper accepted. I was hoping to get more information on the baseline numbers. It seems GDUMB leverages data augmentation, which I don't think was used for the other baselines. Do you have GDUMB numbers without data augmentation ? Currently in the process of adding GDUMB as a baseline in our ICML submission.

pclucas14 commented 3 years ago

bump @drimpossible

drimpossible commented 3 years ago

Hi,

Sorry for the late reply! We don't have numbers without data augmentation in our paper unfortunately, but I vaguely remember augmentation being critical to our method performing well due to sparseness of data (since we're training from scratch).

Primary aspects on which we differ from online CL approaches are: (i) we use LR schedules instead of a fixed LR (ii) we use (standard) data augmentation for memory samples (as you pointed out). We wanted to show the potential of what can be done with using just memory, and hopefully have forthcoming approaches build upon it.

When we concluded "past approaches don't use memory efficiently"-- we meant precisely these two details in case of online CL approaches. Past approaches considering a batch as a mix of memory and new data and the focus on gradient updates on them which lead to these two issues: (i) LR schedules were not used due to limited (mostly one) update (ii) it didn't make sense to augment the new data coming in, hence augmentation wasn't done on batches -- leading to memory samples also being left out.

In the case of offline CL (CIFAR100/Imagenet100), most approaches do use standard data augmentations and LR schedules as well, the differences there are mostly in other aspects.

pclucas14 commented 3 years ago

Hi,

Thanks for taking the time to answer. We ended up running basic ER with augmentations so we could compare with your numbers. Turns out that yes, augmentation plays a critical role here, where ER + aug can beat GDUMB in the online setting.

I want to emphasize that while you may have cleared this part in the text, it's confusing more than anything else to compare methods across such different settings (e.g. Table 3, where the reported results don't have augmentation). I think it's easy for the reader to draw the conclusion that the gains seen by GDUMB come from the offline iid training (and not other components like augmentations which are applicable to all other methods).

Best, Lucas

drimpossible / GDumb

Gdumb without augmentations #5