Memory leak with parallel transcribe

Hey there, thank you for the project.

I have spotted a memory leak in the latest release (1.0.3). When transcribing sequentially, the memory behaves as expected. When however it is called in parallel, the memory usage keeps increasing until OOM, even if garbage collected manually.

I have built a minimal reproducible example. Notice also how "mem after gc" increases when you increase PARALLEL to 5 or 6.

wget "https://cdn.pixabay.com/download/audio/2024/07/23/audio_9f165cf892.mp3?filename=medieval-gamer-voice-donx27t-forget-to-subscribe-226581.mp3" -O test.mp3

import gc
from threading import Thread

from faster_whisper import WhisperModel
import psutil

PARALLEL = 4
THREADS = []
MODEL = WhisperModel('large-v2', device='auto', compute_type='int8', cpu_threads=4)

def get_rss():
    ''' Get current memory usage (RSS) in MB '''
    return int(psutil.Process().memory_info().rss / 1048576)

def transcribe():
    print(f'mem before {get_rss()}')
    segments, _info = MODEL.transcribe('test.mp3')
    _ = list(segments)
    print(f'mem after  {get_rss()}')

def sequential():
    print('sequential:')
    for _ in range(PARALLEL):
        transcribe()

def parallel():
    print('\nparallel:')
    for _ in range(PARALLEL):
        THREADS.append(Thread(target=transcribe))
        THREADS[-1].start()

def main():
    sequential()
    gc.collect()
    print(f'\nmem after gc {get_rss()}')
    parallel()

    for t in THREADS:
        t.join()

    gc.collect()
    print(f'\nmem after gc {get_rss()}')

if __name__ == '__main__':
    main()

Output:

sequential:
mem before 1761
mem after  2617
mem before 2617
mem after  2617
mem before 2617
mem after  2617
mem before 2617
mem after  2617

mem after gc 2617  # everything fine up until here

parallel:
mem before 2617
mem before 2617
mem before 2617
mem before 2617
mem after  2691
mem after  2691
mem after  2691
mem after  2691

mem after gc 2691  # leak

Edit: There's another catch. When you run the same script multiple times, you will notice the outcome is very different sometimes for the final "mem after gc".

for run in {1..10}; do python test.py|tail -n1;done

mem after gc 2669
mem after gc 2670
mem after gc 2669
mem after gc 2671
mem after gc 2670
mem after gc 2670
mem after gc 4098  # !
mem after gc 2670
mem after gc 2669
mem after gc 2670

SYSTRAN / faster-whisper

Memory leak with parallel transcribe #1055