HesitantlyHuman / autoclip

Implementation of adaptive gradient clipping for base pytorch
MIT License
12 stars 2 forks source link

Issue with torch.save() #8

Closed WhaleCoded closed 1 year ago

WhaleCoded commented 1 year ago

When saving with torch.save() and loading with torch.load(). Autoclipper hits the max recursion depth.

optimizer = torch.optim.Adam(
        model.parameters(),
        lr = config['learning_rate'],
        weight_decay = config['weight_decay']
    )
optimizer = QuantileClip.as_optimizer(optimizer, config['clip_quantile'])
torch.save(optimizer, some_path)

optimizer = torch.load(some_path)
Traceback (most recent call last):
  File "/miniconda3/envs/experiments/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  File "/miniconda3/envs/experiments/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  File "/miniconda3/envs/experiments/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  [Previous line repeated 997 more times]
RecursionError: maximum recursion depth exceeded
HesitantlyHuman commented 1 year ago

Recreated on my machine with the following interactive session:

[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from autoclip.torch import QuantileClip
>>> model = torch.nn.Linear(5, 5)
>>> optimizer = torch.optim.Adam(
...     model.parameters(),
...     lr = 0.05,
...     weight_decay = 0.1,
... )
>>> optimizer = QuantileClip.as_optimizer(optimizer, 0.5)
>>> torch.save(optimizer, 'test_file.pth')
>>> optimizer = torch.load('test_file.pth')
>>> optimizer.lr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tanner/miniconda3/envs/branch/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  File "/home/tanner/miniconda3/envs/branch/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  File "/home/tanner/miniconda3/envs/branch/lib/python3.9/site-packages/autoclip/torch/clipper.py", line 235, in __getattr__
    return getattr(self.optimizer, attr)
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
HesitantlyHuman commented 1 year ago

Looks like this problem will only rear it's head when the accessed attribute does not exist in optimizer. For whatever reason, what is normally a nice readable error like: AttributeError: 'Adam' object has no attribute 'lr' becomes this ugly recursion problem.

As an example of the saving and loading working correctly:

Python 3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from autoclip.torch import QuantileClip
>>> model = torch.nn.Linear(5, 5)
>>> optimizer = torch.optim.Adam(model.parameters())
>>> optimizer = QuantileClip.as_optimizer(optimizer, 0.5)
>>> torch.save(optimizer, "test_file.pth")
>>> optimizer = torch.load("test_file.pth")
>>> print(optimizer.defaults)
{'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}
>>> 
HesitantlyHuman commented 1 year ago

Looks like it was the same problem as this stackoverflow question. Since torch.save uses pickle under the hood, it naturally has the same issues when it comes to getattr.

The issue has been resolved with patch update 0.2.1.

HesitantlyHuman commented 1 year ago

For those who have future problems using torch.save or pickle and find themselves on this thread, it is generally recommended to use the state_dict pattern for saving and checkpointing. See the README.md for more info.