I'm doing a fun project to transcribe radio broadcast.
I'm taking 15 minutes recordings and feeding it to WhisperCPP.
More often than not, I noticed, it passes through talking as [Music], and other times it doesn't output [Music] and the actual lyrics comes out.
I don't mind lyrics coming out in the output, but I do mind talking being skipped and outputted as [Music]
Is there something I can do to improve the results?
I'm guessing maybe where along the lines of using the prompt argument to feed in something.
I'm doing a fun project to transcribe radio broadcast. I'm taking 15 minutes recordings and feeding it to WhisperCPP. More often than not, I noticed, it passes through talking as [Music], and other times it doesn't output [Music] and the actual lyrics comes out. I don't mind lyrics coming out in the output, but I do mind talking being skipped and outputted as [Music]
Is there something I can do to improve the results? I'm guessing maybe where along the lines of using the prompt argument to feed in something.
Any suggestions? Thanks.