Closed sebsilas closed 1 year ago
Is this intuition wrong?
I guess so. You'll have to dig deeper into the training data of the model to find out. Probably this question should be asked to the authors of the model at https://github.com/openai/whisper
Closing as this is not really an issue, rather a question on the scope of the model.
Rather surprisingly, to me, the base model is detecting music for a wav I am trying to transcribe lyrics from (which does have lyrics/singing), but the tiny model detects and transcribes lyrics. I would expect the model with higher capability to be more likely to detect lyrics/words. Is this intuition wrong?
The file in question is https://file.io/nxRo3WDAkR0u (yes it is a 16-bit .wav).
Thanks, Seb