MadryLab / robustness

A library for experimenting with, training and evaluating neural networks, with a focus on adversarial robustness.
MIT License
903 stars 181 forks source link

0 training loss #83

Closed santosh-b closed 3 years ago

santosh-b commented 3 years ago

For Pytorch 1.7 only:

Hi. I'm getting a UserWarning: Failed to calculate the accuracy. which results in 0 loss for training (so- presumably, an error being thrown during gradient update or something). I'm running the following command (from the examples) on Colab:

python -m robustness.main --dataset cifar --data /path/to/cifar \
   --adv-train 0 --arch resnet18 --out-dir /logs/checkpoints/dir/

Any help? Thanks

Icxa commented 3 years ago

I am encountering the same warning since a couple of days. A script that successfully calculated the accuracy few weeks back, is now giving this UserWarning. P.S: I had re-installed the package robustness few days ago.

andrewilyas commented 3 years ago

Thanks for bringing this up, looking into it right now. The good thing is that this is an error thrown during calculating the accuracy from the predictions, not from the actual gradient update, so the network should indeed be training. I am looking into the error now.

Can you tell me what version of PyTorch and robustness you are running?

Edit: I just ran the command above using the current master branch version of robustness and PyTorch 1.6 and did not encounter any errors, so it may be a versioning problem.

santosh-b commented 3 years ago

Thanks for the quick update. You're absolutely correct, it's a version issue. Pytorch 1.7 causes this to not report accuracy, likely due to some compatibility error within the metrics reporting (which is try/catch blocked). I'll update the original issue accordingly

Icxa commented 3 years ago

Yes I confirm the same that it works completely fine with PyTorch 1.6.0. Thank you very much for your response @andrewilyas :)

andrewilyas commented 3 years ago

Just to provide an update on this: I've figured out what's changed between 1.6 and 1.7 that causes this error, and will push a hotfix in the next few days. In the meantime, if you want to change your own local copy, it just requires changing view to reshape on this line:

https://github.com/MadryLab/robustness/blob/2dabf3bdd8057fdc0718b2f8d8d90d89b1a109df/robustness/tools/helpers.py#L75

That should suffice to fix the issue.

andrewilyas commented 3 years ago

This has now been fixed and pushed to PyPI, closing now! Feel free to open a new issue if anything else arises.