Open ghost opened 4 years ago
1) The only reliable methods is to retrain on high resolution videos. 2) You can also try to use of the shell video super-resolution method. 3) Since all the networks are fully convolutional you can actually try to use pretrained checkpoints , trained on 256 images. In order to do this change the size in https://github.com/AliaksandrSiarohin/first-order-model/blob/2ed57e0e7825717a966ea9eca95e7abd61edd78f/demo.py#L121 to size that you want. Also it may be benificial to change scale_factor parameter in config https://github.com/AliaksandrSiarohin/first-order-model/blob/2ed57e0e7825717a966ea9eca95e7abd61edd78f/config/vox-256.yaml#L26 and in https://github.com/AliaksandrSiarohin/first-order-model/blob/2ed57e0e7825717a966ea9eca95e7abd61edd78f/config/vox-256.yaml#L38. For example if you want 512 resolution images change it to 0.125, so that input resolution for these networks is always 64.
If you have any lack with these please share your findings.
@AliaksandrSiarohin thanks for the feedback.
Notice however that point 3 doesn't work out-of-the-box. If I change the scale factors as you mention I get an error for incompatible shapes.
Also, as I'm planning to try out some super-resolution methods for this, I'm curious about what you mean with "shell video super-resolution method"?
Can you post the error message you got? I mean some video super resolution method, like one there https://paperswithcode.com/task/video-super-resolution
@AliaksandrSiarohin
Error(s) in loading state_dict for OcclusionAwareGenerator:
size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).
Ah yes you are right. Can you try in https://github.com/AliaksandrSiarohin/first-order-model/blob/2ed57e0e7825717a966ea9eca95e7abd61edd78f/modules/util.py#L205 to hard set sigma=1.5?
Cool, that worked! Could it be generalized for other resolutions? I'll do some tests and comparisons using super-resolution
What do you mean? Generalized?
Is the scale factor proportional to image size? Like if I wanted to try with 1024x1024 I should use scale_factor = 0.0625?
Also is the fixed sigma (1.5) valid only for size 512? What about for size 1024?
I was interested in generalizing my setup such that these values can be derived automatically by the given image size.
Yes you should use scale_factor = 0.0625. In other words kp_detector and dense_motion should always operate on the same 64x64 resolution. This sigma is parameter of anti-aliasing for downsampling, in principle any could be used, I select the one which is used by default in scikit-image. So sigma=1.5 is default for 256x256. But I don't think it affect results that much. So you can leave it equal to 1.5 or you can avoid loading this dense_motion_network.down.weight parameter, by removing it from state_dict.
Thanks so much for the support, really valuable info here!
Hi ,have you retrained on high resolution videos? If i do not retrain on new datasets, instead just do as the point3 mentioned, can I get a good result?
See https://github.com/tg-bomze/Face-Image-Motion-Model for point2
@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
@AliaksandrSiarohin
sigma=1.5
does not work for 1024x1024 source images (with scale factor of 0.0625). I get the following error:
File "C:\Users\admin\git\first-order-model\modules\util.py", line 180, in forward
out = torch.cat([out, skip], dim=1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1 and 2 in dimension 2 at c:\a\w\1\s\tmp_conda_3.6_061433\conda\conda-bld\pytorch_1544163532679\work\aten\src\thc\generic/THCTensorMath.cu:83
But I can confirm that hard coding sigma=1.5
works only for 512x512 images (with scale factor of 0.125).
Can you please let us know the correct setting for 1024x1024 images? Thank you for your wonderful work.
@pidginred can you provide full stack trace, and your configs.
@AliaksandrSiarohin Certainly! Here are the changes I made (for 1024x1024 / 0.0625) & the full error stack:
diff --git a/config/vox-256.yaml b/config/vox-256.yaml
index abfe9a2..10fce42 100644
--- a/config/vox-256.yaml
+++ b/config/vox-256.yaml
@@ -23,7 +23,7 @@ model_params:
temperature: 0.1
block_expansion: 32
max_features: 1024
- scale_factor: 0.25
+ scale_factor: 0.0625
num_blocks: 5
generator_params:
block_expansion: 64
@@ -35,7 +35,7 @@ model_params:
block_expansion: 64
max_features: 1024
num_blocks: 5
- scale_factor: 0.25
+ scale_factor: 0.0625
discriminator_params:
scales: [1]
block_expansion: 32
diff --git a/demo.py b/demo.py
index 848b3df..28bea70 100644
--- a/demo.py
+++ b/demo.py
@@ -134,7 +134,7 @@ if __name__ == "__main__":
reader.close()
driving_video = imageio.mimread(opt.driving_video, memtest=False)
- source_image = resize(source_image, (256, 256))[..., :3]
+ source_image = resize(source_image, (1024, 1024))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
diff --git a/modules/util.py b/modules/util.py
index 8ec1d25..cb8b149 100644
--- a/modules/util.py
+++ b/modules/util.py
@@ -202,7 +202,7 @@ class AntiAliasInterpolation2d(nn.Module):
"""
def __init__(self, channels, scale):
super(AntiAliasInterpolation2d, self).__init__()
- sigma = (1 / scale - 1) / 2
+ sigma = 1.5 # Hard coded as per issues/20#issuecomment-600784060
kernel_size = 2 * round(sigma * 4) + 1
self.ka = kernel_size // 2
self.kb = self.ka - 1 if kernel_size % 2 == 0 else self.ka
(base) C:\Users\admin\git\first-order-model-1024>python demo.py --config config/vox-256.yaml --driving_video driving.mp4 --source_image source.jpg --checkpoint "C:\Users\admin\Downloads\vox-cpk.pth.tar" --relative --adapt_scale
demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Traceback (most recent call last):
File "demo.py", line 150, in <module>
predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
File "demo.py", line 65, in make_animation
kp_driving_initial = kp_detector(driving[:, :, 0])
File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\git\first-order-model-1024\modules\keypoint_detector.py", line 53, in forward
feature_map = self.predictor(x)
File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\git\first-order-model-1024\modules\util.py", line 196, in forward
return self.decoder(self.encoder(x))
File "C:\Users\admin\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\git\first-order-model-1024\modules\util.py", line 180, in forward
out = torch.cat([out, skip], dim=1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1 and 2 in dimension 2 at c:\a\w\1\s\tmp_conda_3.6_061433\conda\conda-bld\pytorch_1544163532679\work\aten\src\thc\generic/THCTensorMath.cu:83
@pidginred fixed sigma worked on my side for any resolution, including 1024x1024. it's not the reason of your problems.
@eps696 What was your scale factor for 1024x1024? And did you get a proper output?
@pidginred same as yours, 0.0625. but i also resize driving_video, not only source_image (which i see you don't).
@eps696 Confirmed that worked. However, I lost almost complete eye & mouth tracking (compared to 256x256), and it results in lots of weird artifacts and very poor quality output.
Are you getting good quality results (in terms of animation) using 1024x1024 compared to 256x256?
@pidginred i've used it for rather artistic purposes (applying to face-alike imagery), so cannot confirm 100%. it definitely behaved very similar with 1024 and 256 resolutions, though. speaking animation quality, quite a lot was said here about the necessity of having similarity in poses (or face expressions) between the source image and the starting video frame. i think you may want to check that first.
@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
I had the same problem
@eps696 Can you share the revised file?After I followed the above steps, the facial movements were normal, but the mouth could not open.
@zpeiguo that project is unreleased yet, sorry. and this topic is about high res images. check other issues for 'normality' of movements.
@eps696 Can you share the revised file? After I followed the above steps, the facial movements were normal, but the mouth could not open.
Same here. Mouth won't open. I believe that the best is to retrain everything with a 512 rez
@eps696 Confirmed that worked. However, I lost almost complete eye & mouth tracking (compared to 256x256), and it results in lots of weird artifacts and very poor quality output.
Are you getting good quality results (in terms of animation) using 1024x1024 compared to 256x256?
I have also tested with the third method with 512, the animation quality is lower than the 256. I have to judgement as to why, I expect the quality to be the same with the same 64 keypoints.
I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.
I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.
I also followed method 3 and the animation is not acceptable :-( Mouth does not open at all and the face is distorted all the time. Maybe have to use AI to upscale 256 to 512 video :-)
I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.
I also followed method 3 and the animation is not acceptable :-( Mouth does not open at all and the face is distorted all the time. Maybe have to use AI to upscale 256 to 512 video :-)
Yes in theory. It depends on the video output quality I suppose. I have tried with Topaz Labs software and it also enhances distortions.
@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
Which super resolution network did you end up using? :)
I got method 3 working on Windows 10 following the steps above and successfully output a 512 version. However, the results are of much lower quality animation wise. Hoping we can get a 512 or higher checkpoint trained soon.
In demo.py, I try also resizing "driving_video", it works:
driving_video = [resize(frame, (512, 512))[..., :3] for frame in driving_video]
In demo.py, I try also resizing "driving_video", it works:
driving_video = [resize(frame, (512, 512))[..., :3] for frame in driving_video]
Yes, it ran. But my result (animation) was terrible.
How can I change blending mask size?
Hi all,
I was wondering if anyone has succeeded in successfully retraining the network to support 512x512 (or higher) images ? Before attempting this my self, I thought it might be a good idea to check if anyone has succeeded in retraining and if yes if that person would be kind enough to provide the checkpoints/configuration with the community ? 🙏
Kind regards
@AliaksandrSiarohin @5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
hi @LopsidedJoaw, which super-resolution method did you use to get the 320320 size result from 256256 input as your gif shows?
I used the same method described in the first 10 or so entries on this post.
On 9 Apr 2021, at 08:04, TracelessLe @.***> wrote:
@AliaksandrSiarohin https://github.com/AliaksandrSiarohin @5agado https://github.com/5agado I have run some tests using the method detailed in point 2.
Generally the result looks like this:
https://user-images.githubusercontent.com/37964292/78800976-fda86580-79b3-11ea-866e-6dfe046b6a20.gif It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong...
Many thanks for your excellent work.
hi @LopsidedJoaw https://github.com/LopsidedJoaw, which super-resolution method did you use to get the 320320 size result from 256256 input as your gif shows?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AliaksandrSiarohin/first-order-model/issues/20#issuecomment-816462604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBUUBDP6UGKSPLIXO3KDULTH2RH7ANCNFSM4LKIORHA.
I used the same method described in the first 10 or so entries on this post. … On 9 Apr 2021, at 08:04, TracelessLe @.***> wrote: @AliaksandrSiarohin https://github.com/AliaksandrSiarohin @5agado https://github.com/5agado I have run some tests using the method detailed in point 2. Generally the result looks like this: https://user-images.githubusercontent.com/37964292/78800976-fda86580-79b3-11ea-866e-6dfe046b6a20.gif It would be good to get your thoughts on whether this an issue of using a checkpoint trained on 256 x 256 images, or if I am doing something wrong... Many thanks for your excellent work. hi @LopsidedJoaw https://github.com/LopsidedJoaw, which super-resolution method did you use to get the 320320 size result from 256256 input as your gif shows? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBUUBDP6UGKSPLIXO3KDULTH2RH7ANCNFSM4LKIORHA.
Got that, thank you. :)
I'm going to train a 512x512 face model and release it to the public under the public domain.
Can't wait. Please also share the process. I think many people are interested. Thanks.
I'm going to take 5 days to train on a rtx3090. I'm also going to train a 512x512 motion-cosegmentation model and release to the public as well under the public domain.
Legend
On 14 Apr 2021, at 16:36, adeptflax @.***> wrote:
I'm going to take 5 days to train. I'm also going to train a 512x512 motion-cosegmentation model and release to the public as well under the public domain.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AliaksandrSiarohin/first-order-model/issues/20#issuecomment-819614134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBUUBDYU54MLEG7JURQUCTTIWZAPANCNFSM4LKIORHA.
I need these models for a project I'm working, so I might as well release them to the public.
I got it trained I will be uploading it shortly
Here it is: https://github.com/adeptflax/motion-models with any additional info you might want to know. I uploaded the model to mediafire. Hopefully that doesn't cause any issues.
@adeptflax Thank you so much for your hard work. I managed to run your 512 version. Just for comparison, here are my old 256 footage and the new 512 version:
When trying to run the 512
model with this command: python demo.py --config config/vox-512.yaml --driving_video videos/2.mp4 --source_image images/4.jpg --checkpoint checkpoints/first-order-model-checkpoint-94.pth.tar --relative --adapt_scale --cpu
I get the following error:
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/imageio/core/format.py:403: UserWarning: Could not read last frame of /home/USER/General/Creating animated characters/First order motion model/first-order-model/videos/2.mp4.
warn('Could not read last frame of %s.' % uri)
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Traceback (most recent call last):
File "demo.py", line 144, in <module>
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)
File "demo.py", line 44, in load_checkpoints
generator.load_state_dict(checkpoint['generator'])
File "/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for OcclusionAwareGenerator:
size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).
It runs fine with the 256 model. Has anyone run into the same problem or does anyone know how it could be fixed?
Update: I've fixed the problem, I had to change sigma
to 1.5 as described here:
https://github.com/adeptflax/motion-models
https://github.com/AliaksandrSiarohin/first-order-model/issues/20#issuecomment-600784060 (it's also described there how to change 256 to 512 in the demo.py
file)
Steps to fix:
demo.py
change everything from 256 to 512 around this line: source_image = resize(source_image, (256, 256))[..., :3]
sigma
to 1.5 in utils.py
: sigma = (1 / scale - 1) / 2
to sigma = 1.5
Update: I've fixed the problem, I had to change
sigma
to 1.5 as described here: https://github.com/adeptflax/motion-models #20 (comment) (it's also described there how to change 256 to 512 in thedemo.py
file)When trying to run the
512
model with this command:python demo.py --config config/vox-512.yaml --driving_video videos/2.mp4 --source_image images/4.jpg --checkpoint checkpoints/first-order-model-checkpoint-94.pth.tar --relative --adapt_scale --cpu
I get the following error:/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/imageio/core/format.py:403: UserWarning: Could not read last frame of /home/USER/General/Creating animated characters/First order motion model/first-order-model/videos/2.mp4. warn('Could not read last frame of %s.' % uri) /home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15. warn("The default mode, 'constant', will be changed to 'reflect' in " /home/USER/miniconda3/envs/first/lib/python3.7/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images. warn("Anti-aliasing will be enabled by default in skimage 0.15 to " demo.py:27: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) Traceback (most recent call last): File "demo.py", line 144, in <module> generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu) File "demo.py", line 44, in load_checkpoints generator.load_state_dict(checkpoint['generator']) File "/home/USER/miniconda3/envs/first/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for OcclusionAwareGenerator: size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).
It runs fine with the 256 model. Has anyone run into the same problem or does anyone know how it could be fixed?
I have the same issue
@adeptflax First off, thanks for doing this :)
im having an issue
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
from here
File "demo.py", line 42, in load_checkpoints checkpoint = torch.load(checkpoint_path)
I think it has something to do with the file format of the checkpoint? any ideas?
@adeptflax First off, thanks for doing this :)
im having an issue
_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
from hereFile "demo.py", line 42, in load_checkpoints checkpoint = torch.load(checkpoint_path)
I think it has something to do with the file format of the checkpoint? any ideas?
Same error here "_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified."
@bigboss97 did you do anything to the 512 checkpoint from @adeptflax to get it to work?
Is there a way to support high resolution