dvgodoy / PyTorchStepByStep

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"
https://pytorchstepbystep.com
MIT License
866 stars 332 forks source link

How to extract/save weights after training? #13

Open minertom opened 3 years ago

minertom commented 3 years ago

OK, here I am displaying my utter ignorance again. I did find a post on towards data science entitled "everything you need to know about saving weights in pytorch".

https://towardsdatascience.com/everything-you-need-to-know-about-saving-weights-in-pytorch-572651f3f8de

Now I am stuck. Having saved the weights in the example project, I am aware that the file is not in a human readable format.

So my question now becomes is there a way to take this file of weights which is in pth format and convert it to numpy, which would be human readable? I would like to be able to do further manipulation of the weights in numpy.

Thank You for your patients Tom

dvgodoy commented 3 years ago

Hi Tom,

The saving/load models are used to resume training or deployment, so they are saved in binary format, as they are not intended for being read by human. The save method is actually transforming the state dictionary into its binary representation for saving. If you want to do anything else, either converting it to numpy arrays or into human-readable text, you can go over the dictionary itself.

For example, let's say you have a simple sequential model: model = nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, 1))

If you check its state dictionary, it goes as expected: OrderedDict([('0.weight', tensor([[ 0.0485, 0.3305], [ 0.6338, 0.4103], ... [ 0.3358, -0.3827], [-0.4230, 0.2328]])), ('0.bias', tensor([ 0.2907, 0.3352, 0.1105, -0.6123, 0.2566, -0.4548, 0.4116, 0.4219, -0.4997, 0.0397])), ('2.weight', tensor([[-0.2709, 0.0192, 0.0961, -0.0101, -0.3044, 0.2777, 0.0432, 0.0935, -0.2234, -0.0936]])), ('2.bias', tensor([-0.2365]))])

You can also get the state dictionary of any given layer if you wish: model[2].state_dict() will return you only those weights corresponding to the last layer.

They are all tensors, but you can make them all Numpy arrays: state_dict = model.state_dict() dict_numpy = {k: state_dict[k].cpu().numpy() for k, v in state_dict.items()}

Or if you want to have them in plain text, you can use JSON: text_state = json.dumps({k: state_dict[k].tolist() for k, v in state_dict.items()})

Does it help?

In Chapter 5 (which I will publish in a few days), I will introduce a method to visualize the filters (weights) of convolutional layers, and I will also introduce hooks, which you can use to capture the outputs produced by each layer. I think you'll like the next Chapter :-) I'll be looking forward to your feedback on it.

Best, Daniel

minertom commented 3 years ago

Daniel,

Thank you for the reply. Wow. Looks great. It will be a couple of days before I get a chance to "grok" this fully, but it makes sense.

Yes, I am really looking forward to the next chapter :-) .

Regards Tom

On Sun, Dec 6, 2020 at 8:44 AM Daniel Voigt Godoy notifications@github.com wrote:

Hi Tom,

The saving/load models are used to resume training or deployment, so they are saved in binary format, as they are not intended for being read by human. The save method is actually transforming the state dictionary into its binary representation for saving. If you want to do anything else, either converting it to numpy arrays or into human-readable text, you can go over the dictionary itself.

For example, let's say you have a simple sequential model: model = nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, 1))

If you check its state dictionary, it goes as expected: OrderedDict([('0.weight', tensor([[ 0.0485, 0.3305], [ 0.6338, 0.4103], ... [ 0.3358, -0.3827], [-0.4230, 0.2328]])), ('0.bias', tensor([ 0.2907, 0.3352, 0.1105, -0.6123, 0.2566, -0.4548, 0.4116, 0.4219, -0.4997, 0.0397])), ('2.weight', tensor([[-0.2709, 0.0192, 0.0961, -0.0101, -0.3044, 0.2777, 0.0432, 0.0935, -0.2234, -0.0936]])), ('2.bias', tensor([-0.2365]))])

You can also get the state dictionary of any given layer if you wish: model[2].state_dict() will return you only those weights corresponding to the last layer.

They are all tensors, but you can make them all Numpy arrays: state_dict = model.state_dict() dict_numpy = {k: state_dict[k].cpu().numpy() for k, v in state_dict.items()}

Or if you want to have them in plain text, you can use JSON: text_state = json.dumps({k: state_dict[k].tolist() for k, v in state_dict.items()})

Does it help?

In Chapter 5 (which I will publish in a few days), I will introduce a method to visualize the filters (weights) of convolutional layers, and I will also introduce hooks, which you can use to capture the outputs produced by each layer. I think you'll like the next Chapter :-) I'll be looking forward to your feedback on it.

Best, Daniel

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dvgodoy/PyTorchStepByStep/issues/13#issuecomment-739528817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHGGHHBH7NSPHZ3ORNZKXTSTOYFDANCNFSM4UO6KBVA .