Open zhanghongyong123456 opened 7 months ago
This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.
This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.
Is there any other good driving method
This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.
Is there any other good driving method
if you wanted to animate a single image as it's shown above then D-ID would be your best bet, you can get free credits on every email you sign up with, pretty sure they allow facial hair and skin discoloration, other than that there is wav2lip + upscaling but results would be poor, the rest of the tools out there are trained on real people without facial hair and normal skin so you would get similar results as the above if an error didn't appear, possible sadtalker would work too & that's opensource, and if you needed to animate it you could look into putting the results onto the original video with masking in a video editor although pretty sure sadtalker can copy the head movements
D-ID
- Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker
- I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
D-ID
1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker 2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)
I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it
The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script
You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)
Depthmaps + sadtalker and you get an awesome talking faces with animated videos.
Actually you can do everything with automatic1111 and centralized in just one app.
You are welcome.
David Martin Rius
D-ID
1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker 2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)
I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it
By the way, I think the code was never released. There was just a readme, but finally deleted the repo
D-ID
1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker 2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)
I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it
By the way, I think the code was never released. There was just a readme, but finally deleted the repo
yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.
D-ID
1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker 2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)
I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it
By the way, I think the code was never released. There was just a readme, but finally deleted the repo
yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.
Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model https://hangz-nju-cuhk.github.io/projects/StyleSync
And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down
The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script
You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)
Depthmaps + sadtalker and you get an awesome talking faces with animated videos.
Actually you can do everything with automatic1111 and centralized in just one app.
You are welcome.
David Martin Rius
Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.
Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.
https://drive.google.com/drive/folders/1W9RAyqu2hwrieaWtGG19GmSjkkhreyIA?usp=sharing
D-ID
1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker 2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)
I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it
By the way, I think the code was never released. There was just a readme, but finally deleted the repo
yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.
Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model https://hangz-nju-cuhk.github.io/projects/StyleSync
And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down
I recently found the new homepage for gaia https://gaiavatar.github.io/gaia/
Did you guys manage to reproduce the StyleLipSync training algorithm?
Did you guys manage to reproduce the StyleLipSync training algorithm?
Nope, but this is coming next month, could drive it with a 3dmm or something so it is ctrl with audio rather than driving vid https://yudeng.github.io/Portrait4D-v2/
Do you have any recommendations for open-source modelos for lip sync that can be used commercially? All the ones that I'm finding have (1) no code or (2) non-commercial license
result:![image](https://github.com/Elsaam2y/DINet_optimized/assets/48466610/4b326bf3-020e-467f-903d-8f3b04be6523)
src:
What causes this and how can you get rid of it