SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.48k stars 953 forks source link

Take a long time until "Processing audio with duration ..." log #255

Open tree1891 opened 1 year ago

tree1891 commented 1 year ago

If call two model.transcribe simultaneously, it takes 5 mins to see 'Processing audio with duration ...' log. If call one, it takes 2~3s.

[2023-05-23 09:00:56,877] INFO in app: 3bFMKBD42YI.webm, device: cuda
[2023-05-23 09:00:59,289] INFO in app: 4UQh0M7jjOo.webm, device: cuda
[2023-05-23 09:06:06,576] INFO in transcribe: Processing audio with duration 11:13.506
[2023-05-23 09:06:08,217] INFO in transcribe: Processing audio with duration 15:06.344
[2023-05-23 04:36:54,805] INFO in app: 9JpiQ-tnSQU.webm, device: cuda
[2023-05-23 04:36:56,817] INFO in transcribe: Processing audio with duration 01:50.283
guillaumekln commented 1 year ago

What transcription options are you using?

tree1891 commented 1 year ago

WhisperModel('medium', device="cuda", compute_type="float16", local_files_only=True)

I tested in aws g4dn.xlarge instance. The filename is youtube video id.

guillaumekln commented 1 year ago

And what options do you set when calling model.transcribe?

tree1891 commented 1 year ago

model.transcribe(media_file, word_timestamps=True, beam_size=5)

guillaumekln commented 1 year ago

Between the call to model.transcribe and the log "Processing ...", only the function decode_audio is executed. So I tried decoding the files in separate threads but the decoding time is expected:

import threading
import time

from faster_whisper import decode_audio

s = time.time()
decode_audio("3bFMKBD42YI.webm")
e = time.time()
print(e-s, "time for 3bFMKBD42YI")

s = time.time()
decode_audio("4UQh0M7jjOo.webm")
e = time.time()
print(e-s, "time for 4UQh0M7jjOo")

t1 = threading.Thread(target=decode_audio, args=["3bFMKBD42YI.webm"])
t2 = threading.Thread(target=decode_audio, args=["4UQh0M7jjOo.webm"])

s = time.time()
t1.start()
t2.start()

t1.join()
t2.join()
e = time.time()
print(e-s, "time for both")
2.271620988845825 time for 3bFMKBD42YI
3.12764573097229 time for 4UQh0M7jjOo
3.257882833480835 time for both
tree1891 commented 1 year ago

I called both as web service using flask and waitress, not using thread. waitress's thread count is 2.

...
    logging.info('%s, device: %s', os.path.basename(media_file), device)

    start = time.time()

    # OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
    os.environ['KMP_DUPLICATE_LIB_OK']='True'

    model_size = "medium"

    if device == "cpu":
      model = WhisperModel(model_size, device="cpu", compute_type="int8", local_files_only=True)
    else:
      model = WhisperModel(model_size, device="cuda", compute_type="float16", local_files_only=True)

    segments, info = model.transcribe(media_file, word_timestamps=True, beam_size=5)
...
guillaumekln commented 1 year ago

So you are not just calling model.transcribe but you are also creating a new model here. Try creating a single model instance, and then reuse it for every requests.

tree1891 commented 1 year ago

If share single model instance, it takes more time for 2 requests than non-sharing.

guillaumekln commented 1 year ago

Can you try loading a single model instance with num_workers=2?

tree1891 commented 1 year ago

num_workers seems not to be the cause.

tree1891 commented 1 year ago

I make AMI and use it for aws auto scaling. In original instance, it does work well, but not new instance.

tree1891 commented 1 year ago

It seems to take a time to create WhisperModel instance. Is there any issue using /.cache/huggingface?

[2023-06-01 17:18:00,382] INFO in app: of3_iV5q9Lo.webm, device: cuda
[2023-06-01 17:18:00,383] DEBUG in connectionpool: Starting new HTTPS connection (1): huggingface.co:443
[2023-06-01 17:18:00,623] DEBUG in connectionpool: https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-medium/revision/main HTTP/1.1" 200 1804
[2023-06-01 17:18:00,638] DEBUG in connectionpool: https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-medium/revision/main HTTP/1.1" 200 1804
[2023-06-01 17:24:06,260] WARNING in task: Task queue depth is 1
[2023-06-01 17:24:06,261] INFO in app: whisper model created!
[2023-06-01 17:24:06,261] INFO in app: whisper model created!
[2023-06-01 17:24:11,395] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:24:11,407] INFO in transcribe: Processing audio with duration 16:09.828
tree1891 commented 1 year ago

After clear huggingface cache, it takes a time after 'Processing audio' log.

[2023-06-01 17:49:36,349] INFO in app: whisper model created!
[2023-06-01 17:49:41,269] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:49:41,270] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:49:57,985] WARNING in task: Task queue depth is 3
[2023-06-01 17:49:58,134] WARNING in task: Task queue depth is 4
[2023-06-01 17:50:27,988] WARNING in task: Task queue depth is 5
[2023-06-01 17:50:28,154] WARNING in task: Task queue depth is 6
[2023-06-01 17:50:57,992] WARNING in task: Task queue depth is 7
[2023-06-01 17:50:58,161] WARNING in task: Task queue depth is 8
[2023-06-01 17:51:27,995] WARNING in task: Task queue depth is 9
[2023-06-01 17:51:28,183] WARNING in task: Task queue depth is 10
[2023-06-01 17:51:57,998] WARNING in task: Task queue depth is 11
[2023-06-01 17:51:58,194] WARNING in task: Task queue depth is 12
[2023-06-01 17:52:28,002] WARNING in task: Task queue depth is 13
[2023-06-01 17:52:28,220] WARNING in task: Task queue depth is 14
[2023-06-01 17:52:58,006] WARNING in task: Task queue depth is 15
[2023-06-01 17:52:58,245] WARNING in task: Task queue depth is 16
[2023-06-01 17:53:09,643] INFO in transcribe: Detected language 'ko' with probability 1.00
[2023-06-01 17:53:09,644] INFO in transcribe: Detected language 'ko' with probability 1.00
guillaumekln commented 1 year ago

In original instance, it does work well, but not new instance.

What's the difference between the 2 instances?

tree1891 commented 1 year ago

They are same.

lemondy commented 1 year ago

I have met the same problem, when i use multiprocess, and every process will init a model