ChetanXpro / nodejs-whisper

NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
https://npmjs.com/nodejs-whisper
MIT License
93 stars 22 forks source link

Do not apply .wav to .vtt file #115

Open binarykitchen opened 1 month ago

binarykitchen commented 1 month ago

Once whisper has transcribed and generated the file, your code applies .wav to the subtitle file, which is a bug.

For example, videomail.wav.vtt when it should be just videomail.vtt

And it would be good if the output file can be configured with a new option.

Thanks

timkrins commented 1 month ago

How is this a bug? It just appends .vtt to whatever the source filename was. It is the default behavior of whisper.cpp, and this library does not change it. This script does has an additional feature that may be converting your input sound file into a .wav file if it is not already, which might be where your confusion comes from.

binarykitchen commented 1 month ago

This script does has an additional feature that may be converting your input sound file into a .wav file if it is not already, which might be where your confusion comes from.

I am not using it anymore, yet it always amends .vtt, no matter what your input is. filename.wav.vtt for example is confusing, it should just rename to filename.vtt without wav in it.

But when you say it's Whisper's default behaviour, should I raise this in the Whisper repository instead? If so, which one it is?

ChetanXpro commented 1 month ago

@binarykitchen The underlying whisper.cpp project controls most of this behavior, but we have an option. We can add code to rename the generated file at the end of our wrapper. We could also add an extra option in this npm library for a custom output file name. This would allow for more customized output files. I can implement these changes if they're important to you.

binarykitchen commented 1 month ago

@ChetanXpro I've already fixed this in my project temporarily.

Adding more options to your library feels not right. We should avoid adding too many options. I think the problem should be raised within the "whisper.cpp" project. Do you have a link?

(because using two file extensions like abc.wav.vtt feels like an antipattern)