anvoynov / GANLatentDiscovery

The authors official implementation of Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
416 stars 52 forks source link

KeyError: 'resolution' #13

Closed woctezuma closed 3 years ago

woctezuma commented 3 years ago

Hello,

I try to run this command:

!python run_train.py \
    --gan_type StyleGAN2 \
    --gan_weights /content/stylegan2-pytorch/network-snapshot-005000.pt \
    --deformator ortho \
    --out rectification_results_dir

And I encounter this error:

Traceback (most recent call last):
  File "run_train.py", line 103, in <module>
    main()
  File "run_train.py", line 62, in main
    G = load_generator(args.__dict__, weights_path, args.w_shift)
  File "/content/GANLatentDiscovery/loading.py", line 19, in load_generator
    G = make_style_gan2(args['resolution'], G_weights, shift_in_w)
KeyError: 'resolution'

From the look of it, load_generator() is referenced in two places:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/run_train.py#L62

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L44-L47

I think main() should have a similar block of code where resolution is defined. Or resolution should appear in:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/trainer.py#L20-L25

so that the user can set it via the command-line.

woctezuma commented 3 years ago

I guess I am doing something wrong, because I encounter another issue after:

Then:

Traceback (most recent call last):
  File "run_train.py", line 103, in <module>
    main()
  File "run_train.py", line 62, in main
    G = load_generator(args.__dict__, weights_path, args.w_shift)
  File "/content/GANLatentDiscovery/loading.py", line 19, in load_generator
    G = make_style_gan2(args['resolution'], G_weights, shift_in_w)
  File "/content/GANLatentDiscovery/models/gan_load.py", line 111, in make_style_gan2
    G.load_state_dict(torch.load(weights)['g_ema'])
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 527, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 224, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fbfbb42c193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7fbfbe7bc9eb in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7fbfbe7bdc04 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6c53a6 (0x7fc0066ed3a6 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x2961c4 (0x7fc0062be1c4 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #6: python3() [0x594b71]
frame #7: python3() [0x54a325]
frame #8: python3() [0x5517c1]
frame #10: python3() [0x50a783]
frame #12: python3() [0x507f24]
frame #14: python3() [0x594b01]
frame #15: python3() [0x54a17f]
frame #16: python3() [0x5517c1]
frame #18: python3() [0x50a783]
frame #20: python3() [0x507f24]
frame #21: python3() [0x509c50]
frame #22: python3() [0x50a64d]
frame #24: python3() [0x507f24]
frame #25: python3() [0x509c50]
frame #26: python3() [0x50a64d]
frame #28: python3() [0x509918]
frame #29: python3() [0x50a64d]
frame #31: python3() [0x509918]
frame #32: python3() [0x50a64d]
frame #34: python3() [0x507f24]
frame #36: python3() [0x634dd2]
frame #41: __libc_start_main + 0xe7 (0x7fc0118b2b97 in /lib/x86_64-linux-gnu/libc.so.6)

so:

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2.
Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
woctezuma commented 3 years ago

Alright, I fixed the second issue by upgrading my version of PyTorch.

%pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

PyTorch upgrade

However, I encounter another issue, which I suspect could be due to the fact my StyleGAN2 model uses config-e and your code likely expects config-f, so there is a factor of 2.

Traceback (most recent call last):
  File "run_train.py", line 103, in <module>
    main()
  File "run_train.py", line 62, in main
    G = load_generator(args.__dict__, weights_path, args.w_shift)
  File "/content/GANLatentDiscovery/loading.py", line 19, in load_generator
    G = make_style_gan2(512, G_weights, shift_in_w)
  File "/content/GANLatentDiscovery/models/gan_load.py", line 111, in make_style_gan2
    G.load_state_dict(torch.load(weights)['g_ema'])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
    Missing key(s) in state_dict: "convs.12.conv.weight", "convs.12.conv.blur.kernel", "convs.12.conv.modulation.weight", "convs.12.conv.modulation.bias", "convs.12.noise.weight", "convs.12.activate.bias", "convs.13.conv.weight", "convs.13.conv.modulation.weight", "convs.13.conv.modulation.bias", "convs.13.noise.weight", "convs.13.activate.bias", "to_rgbs.6.bias", "to_rgbs.6.upsample.kernel", "to_rgbs.6.conv.weight", "to_rgbs.6.conv.modulation.weight", "to_rgbs.6.conv.modulation.bias", "noises.noise_13", "noises.noise_14". 
    size mismatch for convs.6.conv.weight: copying a param with shape torch.Size([1, 256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 512, 512, 3, 3]).
    size mismatch for convs.6.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for convs.7.conv.weight: copying a param with shape torch.Size([1, 256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 512, 512, 3, 3]).
    size mismatch for convs.7.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
    size mismatch for convs.7.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for convs.7.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for convs.8.conv.weight: copying a param with shape torch.Size([1, 128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 512, 3, 3]).
    size mismatch for convs.8.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
    size mismatch for convs.8.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for convs.8.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for convs.9.conv.weight: copying a param with shape torch.Size([1, 128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 256, 3, 3]).
    size mismatch for convs.9.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
    size mismatch for convs.9.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for convs.9.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for convs.10.conv.weight: copying a param with shape torch.Size([1, 64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 256, 3, 3]).
    size mismatch for convs.10.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
    size mismatch for convs.10.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for convs.10.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for convs.11.conv.weight: copying a param with shape torch.Size([1, 64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 128, 3, 3]).
    size mismatch for convs.11.conv.modulation.weight: copying a param with shape torch.Size([64, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
    size mismatch for convs.11.conv.modulation.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for convs.11.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for to_rgbs.3.conv.weight: copying a param with shape torch.Size([1, 3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 512, 1, 1]).
    size mismatch for to_rgbs.3.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
    size mismatch for to_rgbs.3.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for to_rgbs.4.conv.weight: copying a param with shape torch.Size([1, 3, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 256, 1, 1]).
    size mismatch for to_rgbs.4.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
    size mismatch for to_rgbs.4.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for to_rgbs.5.conv.weight: copying a param with shape torch.Size([1, 3, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 128, 1, 1]).
    size mismatch for to_rgbs.5.conv.modulation.weight: copying a param with shape torch.Size([64, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
    size mismatch for to_rgbs.5.conv.modulation.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
woctezuma commented 3 years ago

To fix this third issue (config-e vs. config-f), one has to change the channel_multiplier from 2 to 1.

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/models/StyleGAN2/model.py#L361-L371

Potentially, here too:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/models/StyleGAN2/model.py#L611-L613

However, I am now running out of memory on Google Colab:

Traceback (most recent call last):
  File "run_train.py", line 103, in <module>
    main()
  File "run_train.py", line 85, in main
    trainer.train(G, deformator, shift_predictor, multi_gpu=args.multi_gpu)
  File "/content/GANLatentDiscovery/trainer.py", line 207, in train
    imgs_shifted = G.gen_shifted(z, shift)
  File "/content/GANLatentDiscovery/models/gan_load.py", line 60, in gen_shifted
    return self.forward(w + shift, input_is_latent=True)
  File "/content/GANLatentDiscovery/models/gan_load.py", line 55, in forward
    return self.style_gan2([input], input_is_latent=input_is_latent)[0]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 527, in forward
    out = conv2(out, latent[:, i + 1], noise=noise2)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 332, in forward
    out = self.noise(out, noise=noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 286, in forward
    return image + self.weight * noise
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 14.73 GiB total capacity; 13.20 GiB already allocated; 499.88 MiB free; 13.36 GiB reserved in total by PyTorch)
woctezuma commented 3 years ago

To summarize:

anvoynov commented 3 years ago

Hi, @woctezuma thanks a lot for the decent comments! Sorry for the late answer, I will add the fixes soon.

chi0tzp commented 3 years ago

Hi @anvoynov, it seems that this has not been resolved in the repo. What is the recommended way to fix it? @woctezuma have you figured this out?

woctezuma commented 3 years ago

I cannot say, because I could not try on Google Colab due to memory constraints.

anvoynov commented 3 years ago

There were some delays due to deadlines, sorry for that. I'm going to fix this issue on this week.

chi0tzp commented 3 years ago

Thanks both of you! But, since you originally wrote about args['resolution'] and this is not there (in the arguments), I guess that some specific value should have worked. Could you please let me know what this default value would be? Many thanks again.

anvoynov commented 3 years ago

Sure! Apparently, you need the parameters for the StyleGAN2 that was not considered in the paper?

chi0tzp commented 3 years ago

@anvoynov First of all, thanks for taking the time to respond. I'm trying to refactor your code in a way that I make sure that I understand it. So, I'm interested in making it work, first of all, the way that you had it for reproducing paper's results. Do you think that StyleGAN2 could be omitted for this purpose?

woctezuma commented 3 years ago

As far as I remember, the issue was not about a specific value, but the fact that args['resolution'] returned KeyErrorbecause it was never defined before the following lines:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L18-L19

That is what I mentioned here: https://github.com/anvoynov/GANLatentDiscovery/issues/13#issuecomment-695059349

My solution is there: https://github.com/anvoynov/GANLatentDiscovery/issues/13#issuecomment-695009800

chi0tzp commented 3 years ago

As far as I remember, the issue was not about a specific value, but the fact that args['resolution'] returned KeyErrorbecause it was never defined before the following lines:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L18-L19

Sure, that's true. I was thinking of adding an extra argument for this purpose, to which I'm looking for its default value. So, you use 256, right?

woctezuma commented 3 years ago

I use 256 because my images are 256x256. It is not supposed to be a default value. It could be 512 or 1024 for others.

Maybe set it to 128 for consistency with: https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L44-L47

chi0tzp commented 3 years ago

I use 256 because my images are 256x256. It is not supposed to be a default value. It could be 512 or 1024 for others.

Maybe set it to 128 for consistency with:

https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L44-L47

I see, thanks!

anvoynov commented 3 years ago

@woctezuma once again, many thanks for the report!