Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.28k stars 2.21k forks source link

Is there a way to fix "90 Degrees Only" for better results? #323

Open AlonDan opened 2 years ago

AlonDan commented 2 years ago

Is Wav2Lip already have a correct way to fix mouth if the source is not only 90 degrees?

Usually if it was around 80 degrees or less the mouth will came wrong and not fitting as it is if the source (mouth) is 90% in front of the camera.

I wonder if somebody solved this already or if it's a known issue that will be fixed soon?

burning846 commented 2 years ago

You may try some face-alignment methods?

AlonDan commented 2 years ago

Unfortunately I'm not a programmer, Is there such parameter in the current version? or other way to fix this issue?

instant-high commented 2 years ago

I've just made some major changes to inference.py

code is working but bad programming style....

This version additionally needs imutils, res10_300x300_ssd_iter_140000.caffemodel + deploy.prototxt.txt and face-alignment by adrian bulat

AlonDan commented 2 years ago

Thanks @instant-high How can we test your latest improvement? Where to download from and should I only overwrite the old inference.py or it's a different installation (I'm still a newbie, I use Windows 10 + Anaconda).

instant-high commented 2 years ago

For me it runs in the same anaconda installation environment. There are some additional files, nothing will be overwritten. I can't remember if there was something to install except imutils. All other files only need to be copied to the root of your wav2lip cloned repo.

You can try if you like. Maybe I just post a link for donwloading the package: https://drive.google.com/file/d/1y2zZyzjAllz-1WxyLyKqG8_s0F9YAZvM/view?usp=sharing

AlonDan commented 2 years ago

For me it runs in the same anaconda installation environment. There are some additional files, nothing will be overwritten. I can't remember if there was something to install except imutils. All other files only need to be copied to the root of your wav2lip cloned repo.

You can try if you like. Maybe I just post a link for donwloading the package: https://drive.google.com/file/d/1y2zZyzjAllz-1WxyLyKqG8_s0F9YAZvM/view?usp=sharing

Thanks for sharing this! :) Consider me (and maybe others are still newbies) can you please guide how to use it? Example commands so we can try to see some differences between your improvements to the original old Wav2Lip?

UPDATE: (this may help any newbies like myself, or just follow the TXT file to follow)

I used the same BATCH File (.bat) I used with my original, but just replaced with: "inference_cut_caffe.py" and it works...

WOW! it's FAST! I I really like that it also show a preview while making the magic! :)

I've tried using the new "--align_face" but it skew up and face didn't sit on the original I actually noticed 2 heads at the end, I believe you're still working on it but it was interesting to try, thanks! I hope you'll be able to make it work.

If there are any commands or parameters we should try with your improvement please share, thank you for the great upgrade!

-

QUESTION / SUGGESTION: 1 - Can you add a parameter (on/off) so it will work also on short video with longer audio? like the original Wav2Lip? sometimes it's really cool for short Loopable videos, it will be great if we can choose both options! 2 - I wonder if you can add in the future the option to CHOOSE via first frame of the video, if there are multiple faces, I believe they do it in SimSwap GUI version, they call it "specific" could be very useful.

The only thing I believe is not up to you (unless you can surprise all of us) is to use a HIGHER RESOLUTION pre-trained model (since training is still a mystery and hard to understand how to accomplish) maybe it's possible with StyleGAN 3 and GPEN as combination process? I have no idea, I like that individual it's great but... Wav2Lip is very low resolution...

OFF TOPIC: If I could make a GUI I would totally make some new branch dedicated to do all these process like they did with GPEN WINDOWS version and SimSwap GUI version which are so simple to use. GPEN Windows is great because it downloads everything so you don't even need anaconda, it's a Ready to use Batch files like DeepFaceLab, while SimSwap GUI is pure buttons but "sits" on the original SimSwap install.

Just some ideas, don't worries about it too much maybe one day it will inspire somebody to take these wonderful tools and combine into easy-to-use GUI :)

instant-high commented 2 years ago

Yes, you're right. Alignment only works, in theory, if the face is tilted in a range -89 to +89 degrees. Safely it works I think between lower angles . If you get a weird result, the face is not detected correctly. Maybe you can try to pre-rotate your video using the command --rotate (but that only rotates 90 degrees clockwise... (could be changed))

But in any way you should see a second small preview window of the detected and aligned original face besides the result preview in the second step of inference.

This version does the alignment after detecting the face. In another project I perform a rotation of the full frame before cropping the face. But that is much slower. Maybe I will fix that

There seems to be a commercial hq pre-trained model, but we will not get it

Off Topic: of course I made a wav2lip GUI - but as usual in VB6 ;-). Have you tried my SimSwap Tkinter version?

instant-high commented 2 years ago

Added parameter --loop_on

AlonDan commented 2 years ago

WOW! You're awesome!! I actually really like SimSwap, GPEN and of course Wav2Lip, I manually doing some weird tests combining between them, unfortunately with the low res pre-trained model of Wav2Lip even GPEN can't really make it look decent for the lips part or "lips-syncing" in general it kind of ruined the effect.

It's really cool that you're doing these great improvements!

I believe the SimSwap GUI I tried called: "SimSwap 512 GUI" I didn't find the URL but I remember it works only based on pre-installed one, unlike the GPEN which is very much like DeepFaceLab and GPEN with all the Batch files ready to run based on sub-directories structure to place the source and get the out-source, very easy (even for noobs like me!)

But I never got to get a decent 512 res on SimSwap it's always looking smudged so I'm kind of curious if I'll sometime be able to make it right, it's just few clicks and the pre-trained 512res models is there and all, I think it's just not working for me while the smaller (default old pre-trained model) works perfectly fine.

I'm sure that me and others would like to try your GUI versions and give feedback! I don't think I tried your version of SimSwap and Wav2Lip GUI (unless the version I tried are yours, sorry I just don't remember what I installed it's a total mess on my side) that's why I LOVE GUI especially if it's portable without complicated installations. :)

-

Please share your link or feel free to private message me if like, I'll be happy to give feedback and as you already notice, I have a lot of IDEAS to improve things that usually are not in the original and it seems like you adding cool stuff with your programming skills, that's really cool!

Until I'll try your latest changes, if you'll be able to update the files you shared before on Google Drive that will be REALLY awesome, or any updated link, I don't want to mess your code manually. 😅

Keep up the good work, you're doing amazing stuff!

instant-high commented 2 years ago

The caffemodel works best with higher reslution input.

For finetuning the results play with the --pads parameter. --pads 0 10 0 0 is default

changing to eg. 0 20 0 0 will move the mouth in the result down a bit positive bottom (second value) value moves down, negative moves up

Make sure the face does not leave or partially leave your source frame.

AlonDan commented 2 years ago

I already use these, usually I play with them on my batch file and run. I also use the --nosmooth so in most of my tries it catches the mouth and doesn't miss frames or do some weird in-between gaps.

I tried your caffemodel version with stills: .PNG and .JPG but it was a mess, big image with very low res and also the mouth wasn't in the correct place, I'm just sharing this in case you didn't try images.

I'm still amazed how FAST it is with this model, that's really cool!

But videos are the main goal in most cases anyway, I just hope for the correct angle to be a "thing" since it's defiantly showing \😕/ weird places with the mouth if it's not around the 90 degrees, but you already know that hehe.

instant-high commented 2 years ago

Yes. I think it's not usable for much head movement of 360 degrees. Just intended for slightly better results on newsreader or similar. I'll send you the updated script with --loop_on via e-mail I think....

instant-high commented 2 years ago

Can't reproduce your problems using images. Neither alignment on or off. Tried some images with face aligned and face tilted ~ 45 degrees. Everything ok.

AlonDan commented 2 years ago

Thanks for the update! I'll give it a try later on, should be great to have these 2 options now :)

AlonDan commented 2 years ago

I just had a chance to try the new parameter --loop_on But it's not working exactly like the original Wav2Lip, I did some tests with about 32 seconds videos with longer AUDIO 1:36 file, and around 1:04 the audio continues on the output video but the lip-sync are not...

UPDATE: I think the lip-sync still works, but consider the audio is at the same "loud" level, starting arounde 1:04 it's VERY hard to notice lips movement, I tried another AUDIO file and it also did the same weird problem, it's like the "power" that affects the amount of lips reducing as the video goes on, like fading it or something. but it's just a guess I have no idea what actually happens. I tried: pads 0 6 0 0 and pads 0 20 0 0 which is waaay too much but still, kind of fading as the audio.. is still very loud.

At the moment I'm just playing around with random videos, so maybe it's not the best way to test it.

I'll try to reproduce it with other files, maybe I'm doing something wrong... I'm not sure. It could be a false-alarm, sorry if it is.

Maybe you can have a look and try it see if you can make it like the original Wav2Lip default, so we can choose to enable or disable? thanks ahead I hope that my information helps.

instant-high commented 2 years ago

Yes. I noticed that effect. It does not appear on still images. I think it is a wav2lip problem. I split longer vids into 15 sec. Parts and later on join them together. I tried to unload and reload the checkpoints every 300 frames of inference, but no effect on that. Ask the authors...

AlonDan commented 2 years ago

Thanks for the quick reply, I just tested it with the default Wav2Lip version, it did work on 1:40 seconds video but unfortunately it's not as good as your improvements with the speed and the really nice smooth cut around the head / chin.

instant-high commented 2 years ago

@AlonDan You're right. I've located the problem. I'm trying to solve it

instant-high commented 2 years ago

It's definetly not the faster face detection that makes the problem. It's the soft blending (seamless cloning). It skips frames if the cropped face box is odd height or width. I made some changes, it is much better now but not prefect. So I'm not ready with it....

instant-high commented 2 years ago

@AlonDan and everyone else who's interested. Here's version two. https://drive.google.com/file/d/1i-dq_x4gk4C42Qj0tTYZQg-had28WOEQ/view?usp=sharing Successfully tested on longer videos (1:40)

Alignment tested +60/-60 degrees working (nevertheless may produce strange results, depending on source video) Should be done different way, but that's much more code changes...

Can be run with the original commandline. Additional parameters are: --align_face --loop_on --preview

Hope it works...

AlonDan commented 2 years ago

Thanks for another update!! :)

Just to be clear: English isn't my native language so I hope you can follow. Also, I'm just trying to help with my tests so please don't see anything I type as negative in any way, I'm doing my best to give you feedback using my information so hopefully it will help. Actually I admire what you do and I see code as... MAGIC!

I'm doing some tests with your latest version.

  1. Test the Alignment / Tilt (I was really curious about this one) using the --align_face You did a REALLY GOOD JOB HERE! I actually tried some videos with more like 40 - 50 degrees (roughly) for now and the mouth SITS EXACTLY accurate where on the source mouth!

The alignment (tilt) sits exactly on the original source mouth But the ALL face was "dancing" on some 1-4 degrees left and right around the center of the original source face, like a jaggy not stable. I'm guessing you're still working on it because it feels like if that was stable it would look GREAT!!

The last thing was the mask / edge it seems like it's not there yet: the mask is blurry on the edge of the face on one side (the closer to the mouth of course) it looks like a blurry jaggy smudge effect that keep moves like "water" ripples to try fix the background. I think it will look better if the alignment will be more stable as I mentioned above, but it's just a guess.

The same test with the original Wav2Lip: (no alignment) Well... not even close, The mouth appeared on some weird places like the wall, and kept trying to detect a face on other places on each frame it was REALLY bad, big fail on ALL frames! I'm really impressed with what you did so far, GREAT JOB! 👍

If you can take this further somehow and improve it on wider degree + stable alignment + cleaner mask on the edges or whatever accuracy magic you can think of, that will be AWESOME!

-

  1. Test with: --preview --align_face: (caffe) I noticed that on the stage after detection is done, It stops every 3% or 4% for some seconds (not sure how long) and do another few percentage, stops, etc.. maybe that's how things are working, by chunks (I'm not a programmer I have no clue about the magic behind it).

I did the same test without --preview because I thought this what takes the slow-down stops which probably calculating the alignment in the background and "looks" like it stops, while it's actually processing but if that had a progress bat it would be also great to have an indication because it feels like a freeze when it does that but my task manager was fine, so it wasn't a freeze just hidden I guess.

I'm curious if that could be just without the stops, more like one long preview progress bar? or add the other progress bar for whatever it cooks as well? maybe it's not possible but if it is, it will be much nicer to follow while processing all the stages.

-

  1. Same test with pads 0 6 0 0 (After I tried with 0 0 0 0 of course) From my tests, this number is more "gentle" on videos where heads are smaller or far away from the whole frame, if it's closer I get to 0 20 0 0 usually. Same with the default Wav2Lab: Mouth was exactly replacing the original and was 100% accurate on all frames.

I did the same with your version (caffe) - Same Video: Mouth wasn't placed accurate on the source video's mouth but more down direction and it looked more like an extra mouth above the original (2 mouths) in some frames, but some were fine.

If it helps: I use --nosmooth because in most cases usually tracks much more accurate without any weird frame-lags, I never ran into issues even on fast video / bigger spaces between frames.

-

I'll do more tests on other random videos, I'll keep on update if I'll find something else.

It seems like every time you use your magic touch it gets MUCH better than the original Wav2Lip which didn't update for a long time now, everyone should give it a try!

Thanks once again for your wonderful work, YOU ROCKS! 👍

instant-high commented 2 years ago

Thanks for your patience and testing. I'm not native english speaking too...

Face alignment wasn't my initial intention when changing the code. Main thing was to make it faster and stable when frames without face occur. Every facedetection gives slightly different results in size of the bounding box. So please try other pads values. Also fast moving faces are difficult to detect correctly or not at all because of unsharp frames. The way of aligning the face I use in this case is not the best way. I detect and calculate the angle in every full frame before wav2lip low-res model resizes every found face to an internal size of 96x96 pixels. My method rotates that small image and rotates it back after the inference. At last the result is resized to original size and put back to the full frame.

Better way would be to detect and rote the full frame and than crop the face for inference. That would cause more changes to the code. This, and a higer resolution model, could make the whole thing easier.

I've tested alignment on 'normal moving, speaking person' by making that videoclip rotate slowly left and right.

Another problem is when the head looks up or down. This also affects the detection or the relative position of the mouth in the cropped face and leads to visibility of two mouths in the final result.

Maybe I try to optimize that again...

Edit: one thing that could be implemented is to set a specific angle for the whole video duration instead of only 90 degrees clockwise.

instant-high commented 2 years ago

@AlonDan "But the ALL face was dancing on some 1-4 degrees left and right ..." -> please set parameter wav2lip_batch_size 1 and it should be solved. This even solves the "stopping" during inference. Overall inference time is the same. I have to use this because I only have 2GB GPU.

I've added one more parameter now. You can rotate the input video between -180 and +180 degrees instead of only 90 degrees clockwise. Next update will come

AlonDan commented 2 years ago

@AlonDan "But the ALL face was dancing on some 1-4 degrees left and right ..." -> please set parameter wav2lip_batch_size 1 and it should be solved. This even solves the "stopping" during inference. Overall inference time is the same. I have to use this because I only have 2GB GPU.

I've added one more parameter now. You can rotate the input video between -180 and +180 degrees instead of only 90 degrees clockwise. Next update will come

NICE! I just tried it: --wav2lip_batch_size 1 defiantly solved the 2 problems as you mentioned. I believe things are slower or maybe it just looked like since it shows every frame.

Another Question: Unlike --wav2lip_batch_size 1 What does: --face_det_batch_size exactly do? I didn't play with it too much because I never felt any difference, but can you please explain and tell me what numbers should I try? maybe that could help improve results / speed ?

-

WOW! -180 / +180 degrees will sure be interesting, I'll be happy to test your next update and help with my info as I did so far, you sure doing a GREAT JOB!

instant-high commented 2 years ago

Back to the initial theme "...90 degrees only" Next update displays first video frame. So you can draw a line parallel to the eyes from left to right eye. (always clockwise, if head is upside down...) for free alignment of the input +/- 180 degrees

Additional face-alignment during inference still not that good. View 'wav2lip_cut_caffe.txt' for further instructions and additional parameters

Download: https://drive.google.com/file/d/1Ppvj0sKiKoHWZALnKZHHXYvZC2p-NRWR/view?usp=sharing

AlonDan commented 2 years ago

Thanks for the update! After I update the whole new files with the latest version files

I've tried: --no_rotation But it just start running with preview (shows me progress with frame by frame) I didn't have any 1st frame to draw clockwise.

Another question, if I use --no_rotation should I NOT use --align_face ? or I can use both? I would like to run some tests with any new options you add :)

instant-high commented 2 years ago

-- no_rotation turns off this option. By default it is on If you dont't use this parameter it shows the first frame...

Setting the angle is only for video.

I think you can use both because 'drawing' the angle rotates the whole input and rotates back in the end. So face-alignment is just additional to that .

--wav2lip_batch_size and --face_det_batch_size are not used anymore.(set to 1) Also --rotate has no function anymore

Hope this helps you

instant-high commented 2 years ago

"But it just start running with preview (shows me progress with frame by frame) " I can't reproduce?

AlonDan commented 2 years ago

Oh! so it's by default, sorry my bad...

So I'm trying to understand what do you mean by drawing clockwise (I know what clockwise is, that's not the issue) I'm drawing it's not a BOX, but 3 different lines, which makes 2 points per line, in general it allows me to draw 2 lines (6) points, so it's not even a cube. I guess the idea is to make it parallel like you mentioned, but I'm not sure exactly what are the limits I should go for? (Eyes to nose, to mouth?)

What exactly am I drawing? Left Eye to Right Eye? then Nose, then Mouth? Do I just draw a "BOX" clockwise?

I just want to be sure I get the idea, sorry for the confusion.

-

The frame-by-frame in preview is fine, I mean that it's how it was before which is good to see in a preview :)

instant-high commented 2 years ago

No, sorry. You just have to draw one line from (looking at the face) left eye to right eye or parallel to the eyes. Then press enter. That's all. If you don't need to set angle of the face you can press enter without drawing. Clockwise means draw from left to right or from top to bottom or right to left if face is upside down 0 .

AlonDan commented 2 years ago

Thanks! that's a great example, I'll do some tests :)

UPDATE: After quick test drawing one line, I get the mouth on the cheek, sometimes out of the head ...I'll do tests with other videos, so far it seems like it's not sitting on the face.

It seems like the angle of the mouth is correct but it goes to the right, I made sure I do it clockwise as you explained. but the angle is not the issue now but the position of the mouth.

instant-high commented 2 years ago

I made several tests on different videos. Everything ok so far. The same function I also use in my part swap project. Not sure if you know my youtube channel (different name)

AlonDan commented 2 years ago

I tried some more now... If I use --align_face and draw 3 lines: one on eyebrows, mid face, mouth (or chin) then it's ALMOST placed on the original mouth. I tried again with 1 line... the mouth wasn't even close to where it should be.

I'll keep trying on different videos :)

instant-high commented 2 years ago

Don't laugh...Maybe it's a problem with your mouse?

Serious, just click and hold, draw and release. Then press enter.

If you doubleclick or your mouse button gives false signal (I had that on various mouse types, drives me crazy) you get false angle.

AlonDan commented 2 years ago

Hehe nuhhh, it's not my mouse (very accurate DPI trackball) or even with my wacom cintiq stylus. I simply hold and drag to create the line (2 dots like in your example) release, clockwise of course.

It's not working for me with 1 line, so I make 3 lines as I explained, but it's still not accurate. I tried over 7 different videos I keep doing it on so many random videos I don't have something specific. (also different resolutions, different face closeups to the frame).

I'll just keep trying, maybe I'll get it to work with 1 line like you explain.

instant-high commented 2 years ago

You only have 3 tries to draw I forgot to say...

Because I forgot to remove 1 line of code you should see the calculated angle appearing in the command window when you release the mouse button. Drawing, for example, from left bottom to right top should give a positive value, from left top to right bottom it should be negative. Straight line from left to right should give 0, from right to left -180

Maybe someone else will test anfd reply.

AlonDan commented 2 years ago

Hi @instant-high ! Sorry for the off-topic, I didn't know how to contact you directly so I just drop a long yet interesting comment on your YouTube, but it's not related to Wav2Lip (yet 😉) but to your SimSwap GUI.

I didn't find any way to contact you via GitHub / YouTube profiles. How can I message / email you? (not spam if you're interested of course)

I have some more and more ideas (UI/UX design-wise) things can evolve from there :)

instant-high commented 2 years ago

@AlonDan I've changed my profile settings. Take a look

AlonDan commented 2 years ago

@AlonDan I've changed my profile settings. Take a look

Thanks! I just sent you an email :)

instant-high commented 2 years ago

I've removed the above download links because tehy are outdated/buggy versions.

Latest version of wav2lip_cut_caffe_6 is here for download. Read the included 'wav2lip_cut_caffe.txt' for usage.

https://drive.google.com/file/d/1iS_V50R4n1P3vvXLxLQIBf6O9S6mu6f_/view?usp=sharing

I also have a updated GUI for it written in old VB6 ;-)

Please let me know if it works for you and if your interested in getting the GUI

AlonDan commented 2 years ago

Your versions are always very user-friendly improvements! Keep up the good work! 👍