EtienneAb3d / WhisperHallu

Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
274 stars 22 forks source link

GPU out of memory #8

Closed ILG2021 closed 1 year ago

ILG2021 commented 1 year ago

I have a 8G gpu, sometimes it will crash as out of memory. How to reduce the cost of memory?

EtienneAb3d commented 1 year ago

@ILG2021,

First of all, are you using Fast Whisper in place of Standard Whisper? This is the main way to get a smaller (and faster) model, without loss in quality.

You may try to use Spleeter in place of Demucs, but it's a bit less efficient.

Both Demucs and Whisper are loaded at the same time. There is perhaps a way to unload one while the other is in use, but I never experiment with this at this time.

You may also change the code to have one of them with CPU processing, but this could take a much larger time.

ILG2021 commented 1 year ago

I am using faster_whisper. The Demucs offical github said, If you have a GPU, but you run out of memory, please use --segment SEGMENT to reduce length of each split. SEGMENT should be changed to a integer. How can I set segment in your code?

EtienneAb3d commented 1 year ago

Here: https://github.com/EtienneAb3d/WhisperHallu/blob/319dba323b30adda3b227da2d122e5263fd19e73/demucsWrapper.py#L35 There is already a split=True parameter. See: https://github.com/facebookresearch/demucs/blob/3b8430c12242bbbba48769eed6da5190c6ff3c2d/demucs/apply.py#L123

        split (bool): if True, the input will be broken down in 8 seconds extracts
            and predictions will be performed individually on each and concatenated.

This seems to explain this is already the lower possible value: https://github.com/facebookresearch/demucs/blob/14f5032db2f1ebc9056df8ef9b25313f30cc1c8c/demucs/separate.py#L125

    if args.segment is not None and args.segment < 8:
        fatal("Segment must greater than 8. ")
ILG2021 commented 1 year ago

ok, thank you.

ILG2021 commented 1 year ago

can you release your project as a pip package? It is very valuable for whom using whisper. Like whisper-noise-reducer. And there are pipelines that can use vad, Demucs, Spleeter, ffmpeg, marker that we can use one or more of them. I have used faster_whisper built-in vad and integrate demucs and ffmpeg in your code, and I think the marker is not ready for multi languages. But now it works amazing. Thanks for your project.

EtienneAb3d commented 1 year ago

@ILG2021, Thanks for your positive feedback! As said on the main page, at start this code was a pure experimental one, to test some ideas. It wasn't really designed to be released as a well packaged solution. I will investigate on the procedure to release it as a pip package.

ILG2021 commented 1 year ago

That's ok. You can release it and with the contributions of the community, it can be a great project.

EtienneAb3d commented 1 year ago

Both Demucs and Whisper are loaded at the same time. There is perhaps a way to unload one while the other is in use, but I never experiment with this at this time.

Here is a discussion linked to that subject: https://github.com/openai/whisper/discussions/1313#discussioncomment-5813140