Open LAnCeBabY opened 2 months ago
Hi! You can use a single linear layer or CNN+ linear layer as the CIF model, and the input is the features extracted from the Whisper encoder, the label is the number of words in the labeled text, and the loss is RMSE. Code implementation and training methods for CIF models can be found here.
@backspacetg
how many training data you use to train the CIF model
We used the 100-hour Librispeech train-clean data
Hello!Your work is very good but I can‘t find the large-v3 version of cif model. How can I get / train a large-v3 version cif model?