Open Ironieser opened 1 year ago
In the paper, I implement two audio encoders. The local encoder is preprocessed with the argument in the paper. But the global encoder, which is provided in the github, complies with the argument in AV-hubert.
In the paper, I implement two audio encoders. The local encoder is preprocessed with the argument in the paper. But the global encoder, which is provided in the github, complies with the argument in AV-hubert.↳
Get it, and thx for your work.
And there is another question about fine-tuning. The paper indicts that it only fine-tunes the last three layers of transformer blocks of the audio encoder during the TFG training. But I can't follow this in the train.py. It is more like freezing the full audio encoder when "self.ft == True". https://github.com/Sxjdwang/TalkLip/blob/main/models/talklip.py#L122-L123
Could you give me any hint? Thx :D
In the paper, the implementation detail indicts that
But hop and window lengths, and mel bins are 10 ms, 25 ms, and 26 in the function 'def fre_audio' of "info_demo.py" and "class Talklipdata".
The codes utilize the default values.