ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.1k stars 123 forks source link

unstable head result with my own model #25

Closed lokvke closed 2 months ago

lokvke commented 4 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

lokvke commented 4 months ago

https://github.com/ZiqiaoPeng/SyncTalk/assets/95479496/32c95529-8a80-4c9d-b07e-7045c3a4e41f

kike-0304 commented 4 months ago

result.mp4

Do you use smooth_path?

lokvke commented 4 months ago

@kike-0304 just train with the default config.

kike-0304 commented 4 months ago

use smooth_path in test may have stable head, and would it be convenient for you to share your data processing code?

kike-0304 commented 4 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

Hello, EmoTalk project does not provide data processing code. How did you generate the bs.np file?

lokvke commented 4 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

Hello, EmoTalk project does not provide data processing code. How did you generate the bs.np file?

https://github.com/psyai-net/EmoTalk_release/blob/5179b27b2fdd1ca27fcbfa6a3264a5ecfd51d524/demo.py#L55

i don't know if it is right, maybe u can try it.

kike-0304 commented 4 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

Hello, EmoTalk project does not provide data processing code. How did you generate the bs.np file?

https://github.com/psyai-net/EmoTalk_release/blob/5179b27b2fdd1ca27fcbfa6a3264a5ecfd51d524/demo.py#L55

i don't know if it is right, maybe u can try it.

This is bs.npy obtained from audio, we may need to obtain bs.npy from video frames

kike-0304 commented 4 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

"face_rect": [ 61, 0, 384, 468 ] means [xmin, ymin, w, h]?

jinqiupeter commented 4 months ago

Hi @lokvke , may I ask which asr_model did you use to train your own video? if it's ave, how did you generate the aud_ave.npy file? thanks!

lokvke commented 4 months ago

Hi @lokvke , may I ask which asr_model did you use to train your own video? if it's ave, how did you generate the aud_ave.npy file? thanks!

just use the provided audio_visual_encoder.pth

jinqiupeter commented 4 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

lokvke commented 4 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

can u show your result here?

xiaoxiongzhg commented 3 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

can u show your result here?

bs.npy is closely related to the blinking action. Will your character blink when you replace the bs.npy? In your example video, the characters do not blink.

yangzhidao commented 3 months ago

结果.mp4

Great, can you tell me how to get ave_npy?

jinqiupeter commented 3 months ago

I finally got it working:

https://github.com/ZiqiaoPeng/SyncTalk/assets/12045814/5e9e8a01-6c92-4827-b4aa-1bcb6db2499c

jinqiupeter commented 3 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

can u show your result here?

bs.npy is closely related to the blinking action. Will your character blink when you replace the bs.npy? In your example video, the characters do not blink.

I updated code to only mask the lower part of the original video, so the characters' upper face remains the same as original video. See my sample above.

HenryKang1 commented 3 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

can u show your result here?

bs.npy is closely related to the blinking action. Will your character blink when you replace the bs.npy? In your example video, the characters do not blink.

I updated code to only mask the lower part of the original video, so the characters' upper face remains the same as original video. See my sample above.

Could you share which part do you adjust for this? It looks awesome.

jryebread commented 3 months ago

@jinqiupeter can you share the code adjustments you made for this? looks great.

hoping someone can create a short guide how to preprocess your own video data.

edit: welcome to join my discord server to discuss ai avatar stuff and sharing syncTalk tips https://discord.gg/jETUMmUD6h

redstarxz commented 3 months ago

@jinqiupeter can you share the code adjustments you made for this? looks great.

hoping someone can create a short guide how to preprocess your own video data.

edit: welcome to join my discord server to discuss ai avatar stuff and sharing syncTalk tips https://discord.gg/jETUMmUD6h

+1

jinqiupeter commented 3 months ago

@jinqiupeter can you share the code adjustments you made for this? looks great.

hoping someone can create a short guide how to preprocess your own video data.

edit: welcome to join my discord server to discuss ai avatar stuff and sharing syncTalk tips https://discord.gg/jETUMmUD6h

Preprocessing code is mostly the same as ER-NeRF

HenryKang1 commented 3 months ago

ER-NeRF

What audio feature extractor did you use?

jinqiupeter commented 3 months ago

ave, the code is already available in nerf_triplane/provider

hungho77 commented 3 months ago

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

Hello, EmoTalk project does not provide data processing code. How did you generate the bs.np file?

https://github.com/psyai-net/EmoTalk_release/blob/5179b27b2fdd1ca27fcbfa6a3264a5ecfd51d524/demo.py#L55 i don't know if it is right, maybe u can try it.

This is bs.npy obtained from audio, we may need to obtain bs.npy from video frames

hi, i use https://github.com/psyai-net/EmoTalk_release/blob/main/demo.py to create file bs.npy, but the shape output do not match with audio feature shape. Do you meet this problem, please give me advice , thanks.

christopherohit commented 3 months ago

Thanks, I managed to train my own video, though the result is not as good as May's

can u show your result here?

Hi can you share preprocessing code ? Thank for all your working !!!

foxprocn commented 3 months ago

result.mp4

@lokvke 大佬你好,请问如何训练自己的视频,你的预处理模块是否能提供下,我在自己的视频上进行训练,后续的优化效果模块与您持续共享,可以加我vx:wenfeng071555,期待与您的交流

khaidq97 commented 2 months ago

I finally got it working:

kh_kr.mp4

Look great !! Can your share your preprocess and code

G-force78 commented 2 months ago

@jinqiupeter can you share the code adjustments you made for this? looks great. hoping someone can create a short guide how to preprocess your own video data. edit: welcome to join my discord server to discuss ai avatar stuff and sharing syncTalk tips https://discord.gg/jETUMmUD6h

Preprocessing code is mostly the same as ER-NeRF

Amazing result. What differences?