AliaksandrSiarohin / first-order-model

This repository contains the source code for the paper First Order Motion Model for Image Animation
https://aliaksandrsiarohin.github.io/first-order-model-website/
MIT License
14.57k stars 3.22k forks source link

How to change KP_detector and dense_motion parameters to train on Higher resolution? #81

Open stark-akib opened 4 years ago

stark-akib commented 4 years ago

Hello @AliaksandrSiarohin . First of all, congratulations on the great work and thank you for sharing the repository.

I'm planning to train the model to generate higher resolution output (such as 512x512, 1024x1024). I would really appreciate your insight on my approach.

You mentioned here #14

Currently keypoint detector and dense-motion net operate on 64x64 images

Do I need to change this behavior for better motion transfer performance (while training on higher resolution)? How would you suggest doing it?

Looking forward to hearing from you. :)

alessiapacca commented 4 years ago

if I use the pretrained model, with hardset sigma, with 512x512, it works but it works in a very bad way and this is why I was trying to retrain. So your suggestion is to re-train with scale_factor = 0.125 but hardset sigma? Cause the previous 2 experiments I did where using the original sigma. @AliaksandrSiarohin

alessiapacca commented 4 years ago

I mean, I seriously did the same stuff that @stark-akib did:

So the possibilities here are three:

AliaksandrSiarohin commented 4 years ago

I tought you were using png format, that is why it is so slow for you. Yes, you should use .png format.

alessiapacca commented 4 years ago

hey @AliaksandrSiarohin , I downloaded in png format. Now I have the test and train folders. They contain other folders inside, having the name of the corresponding video. Inside every folder there are all the frames in png format. However, if I try to make the training start, I get the error:

TypeError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

what could be the reason for this?

AliaksandrSiarohin commented 4 years ago

This I don't know. I guess you can fix this by replacing frame[idx] with frame[idx]..decode("utf-8")

alessiapacca commented 4 years ago

@AliaksandrSiarohin tried that. Now it gives this one:

ValueError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, (frames[idx]).decode("utf-8")))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, (frames[idx]).decode("utf-8")))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 182, in get_reader
    "Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode

maybe it's a problem of the names of the folders? they are saved with the name of the video and the mp4 extension. For example, the name of a folder is id10001#7w0IBEWc9Qw#000993#001143.mp4

AliaksandrSiarohin commented 4 years ago

No, this should not be a problem if you are on linux. Check what is inside the folder id10001#7w0IBEWc9Qw#000993#001143.mp4 and send what filenames and some files from there.

alessiapacca commented 4 years ago

yes I am on linux. Inside that folder there are 150 png frames going from 0000000.png to 0000149.png photo_2020-11-06_10-40-55 an example of frame is this one photo_2020-11-06_10-41-52

there are other folders with more frames inside, but they are always named starting either from 0000000.png or continuing the previous frame number (if we are talking of the same video of another folder)

alessiapacca commented 4 years ago

now I substituted that line with video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]

the training starts but after a while I get

Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]
  File "/first-oder-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/format.py", line 170, in get_reader
    return self.Reader(self, request)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/format.py", line 221, in __init__
    self._open(**self.request.kwargs.copy())
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 298, in _open
    return PillowFormat.Reader._open(self, pilmode=pilmode, as_gray=as_gray)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 135, in _open
    pil_try_read(self._im)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 680, in pil_try_read
    raise ValueError(error_message)
ValueError: Could not load "" 
Reason: "image file is truncated"
Please see documentation at: http://pillow.readthedocs.io/en/latest/installation.html#external-libraries

it seems like there is a problem on the images, like it's not reading them

AliaksandrSiarohin commented 4 years ago

Guess you are right, try to print the names of the images that case the error and inspect them manually.

alessiapacca commented 4 years ago

@AliaksandrSiarohin AliaksandrSiarohin I did that.

It was printing them as bytes, so something like b'0000000.png'

So I Changed that converting the paths to strings and the name was correct, it could identify the correct numer of frames and also the correct names for the frames. However then, after a bit that the training started, I got this

ValueError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 161, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 121, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 121, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 182, in get_reader
    "Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode

it is strange cause I never had problems with mp4 format. Maybe its the "animation" format in the config file that should be .png?

If I print num_frames, frames and path in frames_dataset.py, I get the correct names: path: /media/user/hdd/vox/train/id10068#5M2EGef.0f4#001945#002065.mp4 num frames: 140 frames: ['0000067.png', '0000000.png', '0000128.png', '0000042.png', '0000024.png', '0000080.png', '0000081.png', '0000087.png', '0000122.png', '0000139.png', '0000025.png', '0000079.png', '0000015.png', '0000021.png', '0000012.png', '0000010.png', '0000044.png', '0000022.png', '0000055.png', '0000125.png', '0000070.png', '0000033.png', '0000065.png', '0000101.png', '0000132.png', '0000103.png', '0000026.png', '0000085.png', '0000074.png', '0000089.png', '0000083.png', '0000001.png', '0000061.png', '0000088.png', '0000041.png', '0000131.png', '0000105.png', '0000097.png', '0000073.png', '0000077.png', '0000110.png', '0000082.png', '0000071.png', '0000109.png', '0000095.png', '0000058.png', '0000098.png', '0000049.png', '0000027.png', '0000048.png', '0000078.png', '0000059.png', '0000066.png', '0000126.png', '0000134.png', '0000005.png', '0000069.png', '0000037.png', '0000057.png', '0000115.png', '0000002.png', '0000031.png', '0000052.png', '0000060.png', '0000117.png', '0000034.png', '0000113.png', '0000006.png', '0000090.png', '0000068.png', '0000133.png', '0000072.png', '0000091.png', '0000019.png', '0000118.png', '0000028.png', '0000045.png', '0000040.png', '0000102.png', '0000023.png', '0000018.png', '0000130.png', '0000029.png', '0000137.png', '0000011.png', '0000035.png', '0000093.png', '0000111.png', '0000106.png', '0000036.png', '0000084.png', '0000053.png', '0000016.png', '0000032.png', '0000136.png', '0000124.png', '0000050.png', '0000020.png', '0000051.png', '0000064.png', '0000100.png', '0000123.png', '0000094.png', '0000039.png', '0000054.png', '0000116.png', '0000121.png', '0000008.png', '0000017.png', '0000099.png', '0000092.png', '0000076.png', '0000063.png', '0000104.png', '0000047.png', '0000138.png', '0000003.png', '0000043.png', '0000129.png', '0000127.png', '0000046.png', '0000108.png', '0000004.png', '0000014.png', '0000096.png', '0000007.png', '0000056.png', '0000114.png', '0000086.png', '0000120.png', '0000038.png', '0000075.png', '0000013.png', '0000135.png', '0000112.png', '0000107.png', '0000030.png', '0000119.png', '0000062.png', '0000009.png']

I even printed the os.path.join output for some of them, and it looks correct: joining : /media/user/hdd/vox/train/id10098#8f2ReesQMrs#001291#001410.mp4/0000080.png joining : /media/user/hdd/vox/train/id10098#8f2ReesQMrs#001291#001410.mp4/0000050.png joining : /media/user/hdd/vox/train/id10748#XaQk7W-ySMo#005166#005379.mp4/0000033.png joining : /media/user/hdd/vox/train/id10748#XaQk7W-ySMo#005166#005379.mp4/0000148.png joining : /media/user/hdd/vox/train/id10909#M3rfGq1-lXg#008731#009013.mp4/0000212.png joining : /media/user/hdd/vox/train/id10909#M3rfGq1-lXg#008731#009013.mp4/0000052.png

AliaksandrSiarohin commented 4 years ago

Yes, yes this I get. Can you check specifically which image is producing an error and validate if it is a good image?

alessiapacca commented 4 years ago

Ok I made it work . It was a problem with some corrupted files. It will take long to train but I will see if like this the results are better than the mp4 version.

Aaron2286 commented 4 years ago

@alessiapacca hi I also want to get a higher resolution, just want to know how about your results, can you tell me, thank you

alessiapacca commented 4 years ago

@Aaron2286 the training is extremely slow, so I still don't know whether the result will be good or not. I am training it though

Aaron2286 commented 4 years ago

@alessiapacca yes I know, thank you very much. Actually, I am not very good at this, but I think this is very important to my grandmother, so I am studying hard. If there are results, can you provide some information? Thank you.

sicilyliu commented 3 years ago

Thank you. I'll give a try on that.

@stark-akib Can you share your checkpoints/model weights?thanks。

chloejihye commented 3 years ago

@stark-akib @alessiapacca Hello, can you share the result of your training? :) I'm really curious about the video quality after training with 512 sizes cos I'm trying to do the same. Your answer would be very appreciated. Thank you !!

celikmustafa89 commented 2 years ago

TO SUM UP THE WHOLE ISSUE:

There is no model for higher resolution (e.g., 512x512). Am I right? If you have can you share it?