Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
93 stars 15 forks source link

My tests show the character has a mask on his face #16

Open zhanghongyong123456 opened 7 months ago

zhanghongyong123456 commented 7 months ago
  1. result: image

  2. src: image What causes this and how can you get rid of it

Elsaam2y commented 7 months ago

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

zhanghongyong123456 commented 6 months ago

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

Is there any other good driving method

Inferencer commented 6 months ago

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

Is there any other good driving method

if you wanted to animate a single image as it's shown above then D-ID would be your best bet, you can get free credits on every email you sign up with, pretty sure they allow facial hair and skin discoloration, other than that there is wav2lip + upscaling but results would be poor, the rest of the tools out there are trained on real people without facial hair and normal skin so you would get similar results as the above if an error didn't appear, possible sadtalker would work too & that's opensource, and if you needed to animate it you could look into putting the results onto the original video with masking in a video editor although pretty sure sadtalker can copy the head movements

zhanghongyong123456 commented 6 months ago

D-ID

  1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker
  2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask image
Inferencer commented 6 months ago

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

davidmartinrius commented 6 months ago

The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script

You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)

Depthmaps + sadtalker and you get an awesome talking faces with animated videos.

Actually you can do everything with automatic1111 and centralized in just one app.

You are welcome.

David Martin Rius

davidmartinrius commented 6 months ago

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

Inferencer commented 6 months ago

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

zhanghongyong123456 commented 6 months ago

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model https://hangz-nju-cuhk.github.io/projects/StyleSync

And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down image

zhanghongyong123456 commented 6 months ago

The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script

You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)

Depthmaps + sadtalker and you get an awesome talking faces with animated videos.

Actually you can do everything with automatic1111 and centralized in just one app.

You are welcome.

David Martin Rius

  1. I want to achieve voice driven mouth speech, how to combine depth and sadtalker, I just briefly tested sadtalker, 2. About SVD, just move the picture, not just let the mouth move
flipkast commented 5 months ago

Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.

Inferencer commented 5 months ago

Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.

https://drive.google.com/drive/folders/1W9RAyqu2hwrieaWtGG19GmSjkkhreyIA?usp=sharing

Inferencer commented 5 months ago

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model https://hangz-nju-cuhk.github.io/projects/StyleSync

And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down image

I recently found the new homepage for gaia https://gaiavatar.github.io/gaia/

paulovasconcellos-hotmart commented 3 months ago

Did you guys manage to reproduce the StyleLipSync training algorithm?

Inferencer commented 3 months ago

Did you guys manage to reproduce the StyleLipSync training algorithm?

Nope, but this is coming next month, could drive it with a 3dmm or something so it is ctrl with audio rather than driving vid https://yudeng.github.io/Portrait4D-v2/

paulovasconcellos-hotmart commented 3 months ago

Do you have any recommendations for open-source modelos for lip sync that can be used commercially? All the ones that I'm finding have (1) no code or (2) non-commercial license