facebookresearch / AlphaNet

AlphaNet Improved Training of Supernet with Alpha-Divergence
Other
97 stars 13 forks source link

The problem of increasing memory usage and learning rate #11

Closed liujiawei2333 closed 2 years ago

liujiawei2333 commented 2 years ago

Hello! Thank you for your excellent work! I had a problem with increasing memory usage (not GPU memory) while training the supernet. I checked and found that it was caused by lines 62 through 68 of the https://github.com/facebookresearch/AttentiveNAS/blob/main/evaluate/attentive_nas_eval.py. If I delete this code, the memory usage stays flat.Have you encountered such problems?

dilinwang820 commented 2 years ago

Sorry, I am not aware not this issue; probably it's because the train_loader would be re-initialized somehow here? Since the number of data needed for BN calibration is small, you might manually save some batches in the training epoch to avoid calling train_loader here?

liujiawei2333 commented 2 years ago

Thank you for your answer! I also have a question about learning rate. I noticed in lines 30 through 32 of the https://github.com/facebookresearch/AttentiveNAS/blob/main/solver/lr_scheduler.py, you didn't use BigNAS 'learning rate strategy, which is cosine drop and 5% initial learning rate constant ending. Why is that? If self.last_epoch > self.warmup_iters seems never to be satisfied, what is the significance of it?

dilinwang820 commented 2 years ago

To our observations, we found the typical SGD + cosine decay setting actually works quite well.