facebookresearch / open_lth

A repository in preparation for open-sourcing lottery ticket hypothesis code.
MIT License
622 stars 113 forks source link

Change dataset- error while loading the pretrained model #3

Closed rahimentezari closed 3 years ago

rahimentezari commented 3 years ago

Hi I reordered the CIFAR10 examples based on some criteria. In order to preserve such an order, I changed the way you created the dataset to form a NumPy array of (600, 100, 32, 32, 3) and change the batch size to 1. With this, for each batch, I load 100 samples, based on my ordering. When I run such a code, it loads the data and starts the pretraining successfully, but it could not load the saved model. Actually, when I tried different --rewinding_steps=500 or 2000it, there is no such model saved in the folder and the code stops with sth like this: [Errno 2] No such file or directory: '../open_lth_data/lottery_8762e726b9148fb4c7124da307b352d7/replicate_1/level_pretrain/main/model_ep0_it2000.pth'

Any idea, why changing the dataset affects the saving procedure? I think there should not exist iteration number 2000 in epoch number 0.

rahimentezari commented 3 years ago

I have an idea regarding this: Before we had 50K samples for training, and batch size = 128 by default. Therefore we have 50K/128 = 390 iterations per epoch. now we have 500 samples and batch size=1. therefore we have 500 iterations per epoch. So it is normal that we have no iteration 2000 for epoch0. I traced the error back to datasets/registry.py line 183. iterations_per_epoch is calculated based on number of all samples which is 50K here. For now, I solved it by hard coding, But I do not know how to fix this automatically.