Closed woctezuma closed 3 years ago
I guess I am doing something wrong, because I encounter another issue after:
resolution
in Params
,
class Params(object):
def __init__(self, **kwargs):
self.resolution = None
self.shift_scale = 6.0
!python run_train.py \
--gan_type StyleGAN2 \
--gan_weights /content/stylegan2-pytorch/network-snapshot-005000.pt \
--deformator ortho \
--out rectification_results_dir \
--resolution 256
Then:
Traceback (most recent call last):
File "run_train.py", line 103, in <module>
main()
File "run_train.py", line 62, in main
G = load_generator(args.__dict__, weights_path, args.w_shift)
File "/content/GANLatentDiscovery/loading.py", line 19, in load_generator
G = make_style_gan2(args['resolution'], G_weights, shift_in_w)
File "/content/GANLatentDiscovery/models/gan_load.py", line 111, in make_style_gan2
G.load_state_dict(torch.load(weights)['g_ema'])
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 224, in __init__
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fbfbb42c193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7fbfbe7bc9eb in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7fbfbe7bdc04 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6c53a6 (0x7fc0066ed3a6 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x2961c4 (0x7fc0062be1c4 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #6: python3() [0x594b71]
frame #7: python3() [0x54a325]
frame #8: python3() [0x5517c1]
frame #10: python3() [0x50a783]
frame #12: python3() [0x507f24]
frame #14: python3() [0x594b01]
frame #15: python3() [0x54a17f]
frame #16: python3() [0x5517c1]
frame #18: python3() [0x50a783]
frame #20: python3() [0x507f24]
frame #21: python3() [0x509c50]
frame #22: python3() [0x50a64d]
frame #24: python3() [0x507f24]
frame #25: python3() [0x509c50]
frame #26: python3() [0x50a64d]
frame #28: python3() [0x509918]
frame #29: python3() [0x50a64d]
frame #31: python3() [0x509918]
frame #32: python3() [0x50a64d]
frame #34: python3() [0x507f24]
frame #36: python3() [0x634dd2]
frame #41: __libc_start_main + 0xe7 (0x7fc0118b2b97 in /lib/x86_64-linux-gnu/libc.so.6)
so:
Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2.
Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
Alright, I fixed the second issue by upgrading my version of PyTorch.
%pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
However, I encounter another issue, which I suspect could be due to the fact my StyleGAN2 model uses config-e
and your code likely expects config-f
, so there is a factor of 2.
Traceback (most recent call last):
File "run_train.py", line 103, in <module>
main()
File "run_train.py", line 62, in main
G = load_generator(args.__dict__, weights_path, args.w_shift)
File "/content/GANLatentDiscovery/loading.py", line 19, in load_generator
G = make_style_gan2(512, G_weights, shift_in_w)
File "/content/GANLatentDiscovery/models/gan_load.py", line 111, in make_style_gan2
G.load_state_dict(torch.load(weights)['g_ema'])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "convs.12.conv.weight", "convs.12.conv.blur.kernel", "convs.12.conv.modulation.weight", "convs.12.conv.modulation.bias", "convs.12.noise.weight", "convs.12.activate.bias", "convs.13.conv.weight", "convs.13.conv.modulation.weight", "convs.13.conv.modulation.bias", "convs.13.noise.weight", "convs.13.activate.bias", "to_rgbs.6.bias", "to_rgbs.6.upsample.kernel", "to_rgbs.6.conv.weight", "to_rgbs.6.conv.modulation.weight", "to_rgbs.6.conv.modulation.bias", "noises.noise_13", "noises.noise_14".
size mismatch for convs.6.conv.weight: copying a param with shape torch.Size([1, 256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 512, 512, 3, 3]).
size mismatch for convs.6.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for convs.7.conv.weight: copying a param with shape torch.Size([1, 256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 512, 512, 3, 3]).
size mismatch for convs.7.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for convs.7.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for convs.7.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for convs.8.conv.weight: copying a param with shape torch.Size([1, 128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 512, 3, 3]).
size mismatch for convs.8.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for convs.8.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for convs.8.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for convs.9.conv.weight: copying a param with shape torch.Size([1, 128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 256, 3, 3]).
size mismatch for convs.9.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
size mismatch for convs.9.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for convs.9.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for convs.10.conv.weight: copying a param with shape torch.Size([1, 64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 256, 3, 3]).
size mismatch for convs.10.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
size mismatch for convs.10.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for convs.10.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for convs.11.conv.weight: copying a param with shape torch.Size([1, 64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 128, 3, 3]).
size mismatch for convs.11.conv.modulation.weight: copying a param with shape torch.Size([64, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
size mismatch for convs.11.conv.modulation.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for convs.11.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for to_rgbs.3.conv.weight: copying a param with shape torch.Size([1, 3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 512, 1, 1]).
size mismatch for to_rgbs.3.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for to_rgbs.3.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for to_rgbs.4.conv.weight: copying a param with shape torch.Size([1, 3, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 256, 1, 1]).
size mismatch for to_rgbs.4.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
size mismatch for to_rgbs.4.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for to_rgbs.5.conv.weight: copying a param with shape torch.Size([1, 3, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 128, 1, 1]).
size mismatch for to_rgbs.5.conv.modulation.weight: copying a param with shape torch.Size([64, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
size mismatch for to_rgbs.5.conv.modulation.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
To fix this third issue (config-e
vs. config-f
), one has to change the channel_multiplier
from 2 to 1.
Potentially, here too:
However, I am now running out of memory on Google Colab:
Traceback (most recent call last):
File "run_train.py", line 103, in <module>
main()
File "run_train.py", line 85, in main
trainer.train(G, deformator, shift_predictor, multi_gpu=args.multi_gpu)
File "/content/GANLatentDiscovery/trainer.py", line 207, in train
imgs_shifted = G.gen_shifted(z, shift)
File "/content/GANLatentDiscovery/models/gan_load.py", line 60, in gen_shifted
return self.forward(w + shift, input_is_latent=True)
File "/content/GANLatentDiscovery/models/gan_load.py", line 55, in forward
return self.style_gan2([input], input_is_latent=input_is_latent)[0]
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 527, in forward
out = conv2(out, latent[:, i + 1], noise=noise2)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 332, in forward
out = self.noise(out, noise=noise)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/GANLatentDiscovery/models/StyleGAN2/model.py", line 286, in forward
return image + self.weight * noise
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 14.73 GiB total capacity; 13.20 GiB already allocated; 499.88 MiB free; 13.36 GiB reserved in total by PyTorch)
To summarize:
resolution
, which can be fixed by adding an entry to Params
, or by editing directly:
https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L18-L19config-f
, but can be used with config-e
if channel_multiplier==1
,Hi, @woctezuma thanks a lot for the decent comments! Sorry for the late answer, I will add the fixes soon.
Hi @anvoynov, it seems that this has not been resolved in the repo. What is the recommended way to fix it? @woctezuma have you figured this out?
I cannot say, because I could not try on Google Colab due to memory constraints.
There were some delays due to deadlines, sorry for that. I'm going to fix this issue on this week.
Thanks both of you! But, since you originally wrote about args['resolution']
and this is not there (in the arguments), I guess that some specific value should have worked. Could you please let me know what this default value would be? Many thanks again.
Sure! Apparently, you need the parameters for the StyleGAN2 that was not considered in the paper?
@anvoynov First of all, thanks for taking the time to respond. I'm trying to refactor your code in a way that I make sure that I understand it. So, I'm interested in making it work, first of all, the way that you had it for reproducing paper's results. Do you think that StyleGAN2 could be omitted for this purpose?
As far as I remember, the issue was not about a specific value, but the fact that args['resolution']
returned KeyError
because it was never defined before the following lines:
That is what I mentioned here: https://github.com/anvoynov/GANLatentDiscovery/issues/13#issuecomment-695059349
My solution is there: https://github.com/anvoynov/GANLatentDiscovery/issues/13#issuecomment-695009800
As far as I remember, the issue was not about a specific value, but the fact that
args['resolution']
returnedKeyError
because it was never defined before the following lines:
Sure, that's true. I was thinking of adding an extra argument for this purpose, to which I'm looking for its default value. So, you use 256, right?
I use 256 because my images are 256x256. It is not supposed to be a default value. It could be 512 or 1024 for others.
Maybe set it to 128 for consistency with: https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L44-L47
I use 256 because my images are 256x256. It is not supposed to be a default value. It could be 512 or 1024 for others.
Maybe set it to 128 for consistency with:
I see, thanks!
@woctezuma once again, many thanks for the report!
Hello,
I try to run this command:
And I encounter this error:
From the look of it,
load_generator()
is referenced in two places:main()
, where the error occurs:https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/run_train.py#L62
load_from_dir()
, whereresolution
is defined as a dictionary key:https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/loading.py#L44-L47
I think
main()
should have a similar block of code where resolution is defined. Orresolution
should appear in:https://github.com/anvoynov/GANLatentDiscovery/blob/36704fef8c8d179ec737968b8f7a64cd033af88c/trainer.py#L20-L25
so that the user can set it via the command-line.