Sxjdwang / TalkLip

396 stars 35 forks source link

hello,--word_root data how to create #27

Open DeepMakerAi opened 1 year ago

Sxjdwang commented 1 year ago

If you use LRS2, the word_root is the same as the video_root. You can read the 268th-271th lines in the train.py for more details.

DeepMakerAi commented 1 year ago

If you use LRS2, the word_root is the same as the video_root. You can read the 268th-271th lines in the train.py for more details.

The LRS2 data request cannot be submitted. I found the video myself. Is there any simple way to implement video training? thanks

Sxjdwang commented 11 months ago

Yes. But you need to text annotation for speeches. I put an example of the text annotation for a speech sample in LRS2 below:

Text: IT MAY TAKE SOME TIME Conf: 6

DeepMakerAi commented 11 months ago

是的。但您需要为演讲添加文字注释。我在下面给出了 LRS2 中语音样本的文本注释示例:

文本:可能需要一些时间 会议:6

thanks

shengyuting commented 9 months ago

@Sxjdwang May I train the model without text annotation. Because it's difficult to get text annotation for my custom videos.

Sxjdwang commented 9 months ago

I have tried with only contrastive loss, it also reduces WER. But I recommend you verify the performance of the lip-reading expert on the dataset you employed following (https://github.com/Sxjdwang/TalkLip/issues/9#issuecomment-1844903453). Note that the inputs of the lip reading expert 88*88 images, you may need to resize faces in the dataset you employed.