facebookresearch / VisualVoice

Audio-Visual Speech Separation with Cross-Modal Consistency
Other
218 stars 35 forks source link

The pre-processed mouth ROIs #26

Open dengyuanjie opened 1 year ago

dengyuanjie commented 1 year ago

Hello, I would like to ask a question.

Regarding the mouth data in the dataset, it is stored as an h5 file.

Could you please explain how it was generated? Is there a pre-trained model available?

If I want to replace VoxCeleb2 with a different dataset, how can I generate the mouth h5 files?

Looking forward to your answer! Thank you very much!!

wcycqjy commented 6 months ago

I have the same question. I want to use LRS2 alternatively.