backspacetg / simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
44 stars 4 forks source link

Large-v3 CIF model #3

Open LAnCeBabY opened 1 month ago

LAnCeBabY commented 1 month ago

Hello!Your work is very good but I can‘t find the large-v3 version of cif model. How can I get / train a large-v3 version cif model?

backspacetg commented 1 month ago

Hi! You can use a single linear layer or CNN+ linear layer as the CIF model, and the input is the features extracted from the Whisper encoder, the label is the number of words in the labeled text, and the loss is RMSE. Code implementation and training methods for CIF models can be found here.