Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
6.94k stars 505 forks source link

About "condition_on_previous_text" #158

Closed yumianhuli1 closed 5 months ago

yumianhuli1 commented 5 months ago

"condition_on_previous_text" is a parameter used in natural language processing models, indicating whether the model should consider the previous text content when generating new text. When set to True, the model will base the generation of the next text on the preceding text content, which can help the model better understand the context and produce more coherent text. This is particularly useful in dialogue systems or continuous text generation tasks, as the model can respond or continue generating text based on the preceding conversation or text content.

So,How can I use "condition_on_previous_text" this param to increase the accuracy of recognition in python's api? Thank you!

asusdisciple commented 5 months ago

If you use the huggingface implementaton there is some overlap between chunks, but the condition_on_previous is not implemented per se. Actually it makes the model work worse in many case, see:

https://github.com/huggingface/transformers/issues/21467

To get back to the topic, I would actually like to see the implementation of the hyperparameter as well, just for the opposite reason of OP. I want to disable condition_on_previous and this is not possible in the huggingface transformer pipeline as shown in the link above.

My benchmarks have consistently shown that whisper models of the transformer pipeline have been outperformed in terms of scoring in comparison to for example faster whisper. This has been due to this parameter which can decrease model performance.

yumianhuli1 commented 5 months ago

If you use the huggingface implementaton there is some overlap between chunks, but the condition_on_previous is not implemented per se. Actually it makes the model work worse in many case, see:

huggingface/transformers#21467

To get back to the topic, I would actually like to see the implementation of the hyperparameter as well, just for the opposite reason of OP. I want to disable condition_on_previous and this is not possible in the huggingface transformer pipeline as shown in the link above.

My benchmarks have consistently shown that whisper models of the transformer pipeline have been outperformed in terms of scoring in comparison to for example faster whisper. This has been due to this parameter which can decrease model performance.

oh,thanks