facebookresearch / VisualVoice

Audio-Visual Speech Separation with Cross-Modal Consistency
Other
218 stars 35 forks source link

Can you release speech enhancement models ? #11

Closed MessyPaste closed 3 years ago

MessyPaste commented 3 years ago

Hello! Thanks for sharing this code with us.

When testing your two-speaker speech separation pre-train models, I found that the model performance deteriorates when extracting a specific single speaker. Only when I combine two speakers' mouth RoIs and faces into the model at the same time can I get a satisfactory separation result. I think this deterioration is caused by separation models, not enhancement models.

In a real scene, the number of speakers is unknown, and extracting only one specific person is needed. So can you provide a speech enhancement model for testing? Such as model structure or pre-trained model.

We will appreciate it if you can provide.

Thanks again for your contribution.

MessyPaste commented 3 years ago

I have solved the problem.

luhuijun666 commented 2 years ago

@MessyPaste I have met the same question.Could you share how you solved the problem ?

ZhengRachel commented 1 year ago

I have solved the problem.

@MessyPaste Hi, I have met the same question while modifying the code for single speaker speech enhancement. Could you share how you solved the problem ?

ZhengRachel commented 1 year ago

@MessyPaste I have met the same question.Could you share how you solved the problem ?

@luhuijun666 Hi! Did you solve the problem?