Base model detects music whereas tiny detects lyrics? - Githubissues

bnosac / audio.whisper

Transcribe audio files using the "Whisper" Automatic Speech Recognition model from R

Other

113 stars 13 forks source link

Base model detects music whereas tiny detects lyrics? #17

Closed sebsilas closed 1 year ago

sebsilas commented 1 year ago

Rather surprisingly, to me, the base model is detecting music for a wav I am trying to transcribe lyrics from (which does have lyrics/singing), but the tiny model detects and transcribes lyrics. I would expect the model with higher capability to be more likely to detect lyrics/words. Is this intuition wrong?

The file in question is https://file.io/nxRo3WDAkR0u (yes it is a 16-bit .wav).

Thanks, Seb

jwijffels commented 1 year ago

Is this intuition wrong?

I guess so. You'll have to dig deeper into the training data of the model to find out. Probably this question should be asked to the authors of the model at https://github.com/openai/whisper

jwijffels commented 1 year ago

Closing as this is not really an issue, rather a question on the scope of the model.