ProGamerGov / neural-style-pt

PyTorch implementation of neural style transfer algorithm
MIT License
834 stars 178 forks source link

Problem with channel_pruning and other models #26

Closed JaledMC closed 4 years ago

JaledMC commented 4 years ago

Hello @ProGamerGov

Thanks for this great repo. VGG and NIN models work like a charm, but using Voltax3 from u/vic8760, I encountered problems with channel_pruning and nyud-fcn32s-color-heavy. With channel_pruning, it returns me this log:

NIN Architecture Detected
Traceback (most recent call last):
  File "neural_style.py", line 409, in <module>
    main()
  File "neural_style.py", line 56, in main
    cnn, layerList = loadCaffemodel(params.model_file, params.pooling, params.gpu)  
  File "/home/GitHub/style_transfer/pytorch_style/CaffeLoader.py", line 136, in loadCaffemodel
    cnn.load_state_dict(torch.load(model_file))
  File "/home/anaconda3/envs/style/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for NIN:
    Missing key(s) in state_dict: "features.4.weight", "features.4.bias", "features.9.weight", "features.9.bias", "features.11.weight", "features.11.bias", "features.16.weight", "features.16.bias", "features.18.weight", "features.18.bias", "features.22.weight", "features.22.bias". 
    Unexpected key(s) in state_dict: "classifier.0.weight", "classifier.0.bias", "classifier.3.weight", "classifier.3.bias", "features.5.weight", "features.5.bias", "features.10.weight", "features.10.bias", "features.12.weight", "features.12.bias", "features.17.weight", "features.17.bias", "features.19.weight", "features.19.bias", "features.21.weight", "features.21.bias", "features.28.weight", "features.28.bias". 
    size mismatch for features.0.weight: copying a param with shape torch.Size([24, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 3, 11, 11]).
    size mismatch for features.0.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
    size mismatch for features.2.weight: copying a param with shape torch.Size([22, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 96, 1, 1]).
    size mismatch for features.2.bias: copying a param with shape torch.Size([22]) from checkpoint, the shape in current model is torch.Size([96]).
    size mismatch for features.7.weight: copying a param with shape torch.Size([51, 41, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 96, 5, 5]).
    size mismatch for features.7.bias: copying a param with shape torch.Size([51]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for features.14.weight: copying a param with shape torch.Size([111, 89, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 256, 3, 3]).
    size mismatch for features.14.bias: copying a param with shape torch.Size([111]) from checkpoint, the shape in current model is torch.Size([384]).
    size mismatch for features.24.weight: copying a param with shape torch.Size([512, 228, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1, 1]).
    size mismatch for features.24.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
    size mismatch for features.26.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1000, 1024, 1, 1]).
    size mismatch for features.26.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1000]).

Like model saved hasn't the expected structure. And with nyud-fcn32s-color-heavy, the message isn't clear to me:

Traceback (most recent call last):
  File "neural_style.py", line 409, in <module>
    main()
  File "neural_style.py", line 56, in main
    cnn, layerList = loadCaffemodel(params.model_file, params.pooling, params.gpu)  
  File "/home/GitHub/style_transfer/pytorch_style/CaffeLoader.py", line 135, in loadCaffemodel
    cnn, layerList = modelSelector(str(model_file).lower(), pooling)
  File "/home/GitHub/style_transfer/pytorch_style/CaffeLoader.py", line 119, in modelSelector
    raise ValueError("Model architecture not recognized.")
ValueError: Model architecture not recognized.

In the wiki it's said that these models work. I try different style and content layers names. Maybe I am doing something wrong?

Thanks in advance

ProGamerGov commented 4 years ago

@JaledMC For some reason my code is not detecting the correct model definition. I have these lines in CaffeLoader.py: https://github.com/ProGamerGov/neural-style-pt/blob/master/CaffeLoader.py#L172-L173, where I try to detect the model based on the name it has. I had to remake my own version of Loadcaffe from Torch7, and because PyTorch models lack a prototxt file, my code determines what model it is by the model's name.

When I run:

python neural_style.py -backend cudnn -model_file models/channel_pruning.pth

It seems to work properly when I run it on Ubuntu or Windows, so could there be an issue with bash? But I would have thought argparse would convert the parameter input to a string. What Python version were you using? And did you make sure that you had the most up to date version of neural-style-pt?

JaledMC commented 4 years ago

Thanks! For some reason, it was searching NIN keys, no matter the input_model. Anyway, I updated your repo as you said, and it works, both python 3.5 and 3.6.9.

PD: I have tried to add -normalize_gradient, but at this moment, the output images don't change much with the style_weight.

100 iterations with style_weight 3000 out_100

200 iterations with style_weight 3000 out_200

If I find any solution, will make a pull request.

ProGamerGov commented 4 years ago

@JaledMC The original neural-style has gradient normalization in the backward pass, while your code had gradient normalization in the forward pass.

Justin Johnson's code also does the total variance denoising in the backward pass as well. That's why neural-style-pt's total variance denoising is different than neural-style's total variance denoising, because I do it in the forward pass.

JaledMC commented 4 years ago

You are totally right, and the code differences are clear. I only gave attention to the Gram implementation and didn't notice it. My fault.

Thank you so much. Because there is no problem, I'm going to close this issue.

ProGamerGov commented 4 years ago

@JaledMC It may be possible to use backward hooks to make gradient normalization possible, but I was never able to get it working. Though I was also a lot worse at programming at the time I tried to get it working.

ProGamerGov commented 4 years ago

So, this is what one would think you would use to replicate neural-style's -gradient_normalization feature in PyTorch, as it's literally the exact same functions:

tensor.div(torch.norm(tensor, 1) + 1e-8)

Torch7's torch.norm(), looks similar to PyTorch's Torch.norm, but I think there may be differences: https://pytorch.org/docs/stable/_modules/torch/functional.html#norm

It also looks like -normalize_gradients from neural-style, may not actually be gradient normalization, and instead a form of gradient scaling: https://github.com/jcjohnson/neural-style/pull/374. Though @jcjohnson seems to have thought it was gradient normalization when he implemented it: https://github.com/jcjohnson/neural-style/commit/0c5d5d5c44d33b0e84733e3e75a7ee69dd8ee2cb

ProGamerGov commented 4 years ago

@JaledMC You can get somewhat similar results to neural-style's -normalize_gradients parameter, if you set the content weight value to 0 with: -content_weight 0.

JaledMC commented 4 years ago

@ProGamerGov, excuse my delay. I didn't see your replies. My fault. Pytorch in my conda enviroment broke down, and resist to load caffe models. I fixed it today, and new runs can be done.

As you said, I used tensor.div(torch.norm(tensor, 1) + 1e-8), and didn't work. Some changes to mimic official repo only gave me errors :/ But I am going to try -content_weight 0.

As always, thanks for your amazing work

ProGamerGov commented 3 years ago

@JaledMC I finally figured out how to implement -normalize_gradients and I've added it to the master branch!

https://github.com/ProGamerGov/neural-style-pt/commit/cbcd023326a3487a2d75270ed1f3b3ddb4b72407

JaledMC commented 3 years ago

I'm suscribed to your repo, and it always amazes me your dedication. Keep this fantastic work!