anothermartz / Easy-Wav2Lip

Colab for making Wav2Lip high quality and easy to use
495 stars 76 forks source link

Discord? Discuss ONNX implementation #72

Open catselectro opened 4 weeks ago

catselectro commented 4 weeks ago

Hi,

I just found this project and the repository at https://github.com/instant-high/wav2lip-onnx-HQ. I think that combining both may make this even faster.

I ran some quick tests using the ONNX model from this repository (with the help of ChatGPT), and it seems I get about 15%-20% faster generation times. However, I've never implemented something like this before, so I might be doing something wrong.

Your Discord link seems to be down, so I couldn't contact you there. Do you have another link or another way to chat?

Thanks for this awesome project.

Best.

anothermartz commented 4 weeks ago

Interesting, I'll have to give out this onyx project a try and then perhaps I can implement an easy install/GUI for it.

Although I'm spending less time at my computer at the moment because I'm about to move home and there's lots of planning and busyness going on for me.

Here's the DeepFaceLab discord with a wav2lip channel that's good for discussing all this stuff:

https://discord.com/invite/9scUkmcf8V

catselectro commented 4 weeks ago

Thanks, I'll take a look. If you want my quick implementation, I can send you the file, I just changed inference.py to use the onnx model. Good luck with your projects!

Echolink50 commented 4 weeks ago

Are their any other improvements besides the speed increase? Thanks

catselectro commented 4 weeks ago

I noticed a slight increase in VRAM usage when using the ONNX model, from 0.3 GB to 0.7 GB, so there's no improvement in that aspect. The model's file size is reduced to a quarter of the original. There might be potential for further improvements in VRAM, but I'm not sure.

Echolink50 commented 4 weeks ago

Ok the vram increase is not to bad. Did you also use the "new" face detection and alignment mentioned or any of the "new" face enhancers mentioned? Any improvements in quality of the lip sync? Thanks

catselectro commented 4 weeks ago

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

Echolink50 commented 4 weeks ago

Oh ok. I saw that the onnx repo had some other features like different face restoration models and different detection and alignment. I will check it out. Thanks

anothermartz commented 4 weeks ago

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

I see no difference in speed between the 2 in my own tests, but GPEN I think is faster than GFPGAN, at least according to tests I did for that using the ONYX project.

I'm more interested in the improved face tracking and also the cool little crop feature where you select the face location to make things faster that way.

But making an easy installer for that project would take me more work than I'm willing to do at the moment, it's still wav2lip after all so while there are improvements, they're not groundbreaking enough for me to adapt at this time.

Echolink50 commented 4 weeks ago

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

I see no difference in speed between the 2 in my own tests, but GPEN I think is faster than GFPGAN, at least according to tests I did for that using the ONYX project.

I'm more interested in the improved face tracking and also the cool little crop feature where you select the face location to make things faster that way.

But making an easy installer for that project would take me more work than I'm willing to do at the moment, it's still wav2lip after all so while there are improvements, they're not groundbreaking enough for me to adapt at this time.

Can you release the onnx implementation and the new features you tested for manual install? Thanks

catselectro commented 4 weeks ago

I used all the functionality on this repo. I just changed the model by the onnx version of the repo I cited, so quality is the same and I used the "improved" method on this repo.

so you mean you just used the wav2lip.onnx file instead of the Wav2Lip.pth file?

Yes, modifying inference.py to load it instead of the .pth file. I noticed this slight speed improvement only when using the onnx on this project, not just by using the onnx project by itself, but I didn't do extensive testing there because the mouth quality on this project seems better, at least for the example I was testing. This is how I modified inference.py: https://gist.github.com/catselectro/90627227b93c92eb0909d2392fa1239a#file-inference_onnx_new-py