Open LAnCeBabY opened 1 month ago
Hi! You can use a single linear layer or CNN+ linear layer as the CIF model, and the input is the features extracted from the Whisper encoder, the label is the number of words in the labeled text, and the loss is RMSE. Code implementation and training methods for CIF models can be found here.
Hello!Your work is very good but I can‘t find the large-v3 version of cif model. How can I get / train a large-v3 version cif model?