decile-team / cords

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.
https://cords.readthedocs.io/en/latest/
MIT License
316 stars 53 forks source link

Possible bug calculating "trn_loss" and "tst_loss" #85

Closed wgcban closed 1 year ago

wgcban commented 1 year ago

Hello,

I have noticed a potential bug in the calculation of trn_loss and test_loss. The trn_loss is currently computed on the entire train dataset using train_eval_loader. This data loader has a batch size that is 20 times larger than that of trainloader. Consequently, when calculating the trn_loss with the train_eval_loader, it is necessary to use the batch size of train_eval_loader rather than the batch size of trainloader.

Likewise, when calculating the test_loss, we should use the batch size of test_eval_loader instead of the batch size of testloader.

https://github.com/decile-team/cords/blob/a3d8dc3218e9d80b2b7ab8361c680e3de300905b/train_sl.py#L616

https://github.com/decile-team/cords/blob/a3d8dc3218e9d80b2b7ab8361c680e3de300905b/train_sl.py#L671

krishnatejakk commented 1 year ago

Thanks for pointing out the issue. I have resolved the issue.