abdeladim-s / pywhispercpp

Python bindings for whisper.cpp
https://abdeladim-s.github.io/pywhispercpp/
MIT License
180 stars 26 forks source link

Removed uncessary channel crushing #48

Closed UsernamesLame closed 2 months ago

UsernamesLame commented 2 months ago

Calling set_channels(1) already converts the audio to one channel, so there was no need to callsplit_to_mono, instead we can just callget_array_of_samples```

This should have a negligible positive impact on performance. Can someone test this quickly for me? I'm not in a position to test at the moment.

UsernamesLame commented 2 months ago

@abdeladim-s Please review this. In the meantime I'm going to write a helper script to auto convert various types of files into something whisper can accept.

UsernamesLame commented 2 months ago

@UsernamesLame, Approved 👍

Looking forward your helper script :)

https://github.com/UsernamesLame/WhisperWav :)

It's not remotely done, but it works. I'm thinking of making it spit out pickled numpy arrays of the audio for faster transcription. As of now, it spits out wav files that Whisper.cpp can handle

abdeladim-s commented 2 months ago

The script looks nice, but if I understand it correctly, it's similar to what I am already doing! There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI. Numpy array of the audio file is what we need.

UsernamesLame commented 2 months ago

The script looks nice, but if I understand it correctly, it's similar to what I am already doing!

There is no need to convert the files to wav as we are calling the cpp functions directly, not calling whisper.cpp using the CLI.

Numpy array of the audio file is what we need.

No I know. But my understanding is you're doing some pre processing pre transcription.

I want to do all the pre processing on large batches of data while transcription is actively happening to remove one more task from the processing pipeline.

Also, if I'm understanding correctly, you're not actually converting it to 16bit files and that may harm transcription.

I'll look over it again, but I also want to remove the need to convert the wav files to numpy arrays before transcribing.

That would help with performance. At least according to my CPython internals book that recommends a lot of things to avoid.

I'm aiming to make the slowest part loading the model into memory. And even then, I want to explore making it '''deepcopy''' able so we can load the model, clone it in memory, and change its settings as needed from the defaults by using copy.deepcopy.

This way multiple instances can be loaded 100% independent of each other vs the current situation with static methods.

I explored making everything not static and deep copy friendly but ran into some issues with PyBind11.

It's late here, so I might have to advise on that front tomorrow.

abdeladim-s commented 2 months ago

Other than that, good luck with your exploration, the code is yours .. looking forward it :smile:

I can merge this for now if you want ?

UsernamesLame commented 2 months ago
  • Yes, pre processing is necessary to make every type of media compatible with whisper.cpp
  • The files need to be converted to 16Khz mono, similar to what you did in the script, it's actually here
  • If you removed the conversion to numpy array, how you are going to pass the data to c++ ?

Other than that, good luck with your exploration, the code is yours .. looking forward it 😄

I can merge this for now if you want ?

Yea please do!

As for how we would load them, we could re-initialize the numpy.ndarrays I guess and pass them along to the C++ interface? I'm struggling to explain it. I need to write some code to show it. Also I was confused why the code wasn't updated, just saw you merge it.

Edit:

My PR isn't merged(?) Says 2 workflows awaiting approval.

Also yea I noticed the channel crushing, sorry. I'm super tired right now so I shouldn't comment too much. I am going to see what else I can work on tomorrow.

abdeladim-s commented 2 months ago

Go get some rest .. Take your time! Looking forward your futur PRs. I'll merge this one for now!

Thanks for the contribution :smile: