Open AjianIronSide opened 3 years ago
Hmm sorry but I didn't quite do this type of research. In theory I think that the method should work, but the general problem that I believe is: the trained students such as sre are likely to be overfitted on their own data i.e., on some type of English speech. Thus I would in theory still recommend to use the method described in the paper (use the teacher to estimate the speech labels).
Btw what do you mean by performance drops? Drops against the student/teacher?
Yes, the fine-tunning models against the student/teacher model you provided. Your model is so good at rejecting noise. If speech is with complicated background noise, it is very likely to be rejected.
The hurdle is, I can not train the teacher model myself, because I do not have the 527-label-type data. Do you have any idea on training the tearcher model ?
Yeah, I tried. Sadly, not good after tunning
Seems weird to me to be honest.
At least I did experiments on even chinese after training using teacher t2
and got good results with that, usually still outperforms the teacher in any way.
Also, the loss during my training usually does not decrease by large. Generally I start at ~0.61 and final loss is around ~0.5
Seems weird to me to be honest. At least I did experiments on even chinese after training using teacher
t2
and got good results with that, usually still outperforms the teacher in any way.Also, the loss during my training usually does not decrease by large. Generally I start at ~0.61 and final loss is around ~0.5
Hi sir could you share how to do this
Just as described in the Readme. First estimate soft labels from a teacher and then train the new student.
Hi,
Do you have any idea about fine-tunning the pretrained model(such sre) to a more complicated scenario using a small related data set? I tried to use the teacher model to label the new data set, and train few epochs with a very small learning rate. Howerver, the performance drops drastically. Quit sad.