backspacetg / simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
45 stars 4 forks source link

Large-v3 CIF model #3

Open LAnCeBabY opened 2 months ago

LAnCeBabY commented 2 months ago

Hello!Your work is very good but I can‘t find the large-v3 version of cif model. How can I get / train a large-v3 version cif model?

backspacetg commented 2 months ago

Hi! You can use a single linear layer or CNN+ linear layer as the CIF model, and the input is the features extracted from the Whisper encoder, the label is the number of words in the labeled text, and the loss is RMSE. Code implementation and training methods for CIF models can be found here.

twmht commented 1 week ago

@backspacetg

how many training data you use to train the CIF model

backspacetg commented 1 week ago

We used the 100-hour Librispeech train-clean data