Open tree1891 opened 1 year ago
What transcription options are you using?
WhisperModel('medium', device="cuda", compute_type="float16", local_files_only=True)
I tested in aws g4dn.xlarge instance. The filename is youtube video id.
And what options do you set when calling model.transcribe
?
model.transcribe(media_file, word_timestamps=True, beam_size=5)
Between the call to model.transcribe
and the log "Processing ...", only the function decode_audio
is executed. So I tried decoding the files in separate threads but the decoding time is expected:
import threading
import time
from faster_whisper import decode_audio
s = time.time()
decode_audio("3bFMKBD42YI.webm")
e = time.time()
print(e-s, "time for 3bFMKBD42YI")
s = time.time()
decode_audio("4UQh0M7jjOo.webm")
e = time.time()
print(e-s, "time for 4UQh0M7jjOo")
t1 = threading.Thread(target=decode_audio, args=["3bFMKBD42YI.webm"])
t2 = threading.Thread(target=decode_audio, args=["4UQh0M7jjOo.webm"])
s = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
e = time.time()
print(e-s, "time for both")
2.271620988845825 time for 3bFMKBD42YI
3.12764573097229 time for 4UQh0M7jjOo
3.257882833480835 time for both
I called both as web service using flask and waitress, not using thread. waitress's thread count is 2.
...
logging.info('%s, device: %s', os.path.basename(media_file), device)
start = time.time()
# OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
os.environ['KMP_DUPLICATE_LIB_OK']='True'
model_size = "medium"
if device == "cpu":
model = WhisperModel(model_size, device="cpu", compute_type="int8", local_files_only=True)
else:
model = WhisperModel(model_size, device="cuda", compute_type="float16", local_files_only=True)
segments, info = model.transcribe(media_file, word_timestamps=True, beam_size=5)
...
So you are not just calling model.transcribe
but you are also creating a new model here. Try creating a single model instance, and then reuse it for every requests.
If share single model instance, it takes more time for 2 requests than non-sharing.
Can you try loading a single model instance with num_workers=2
?
num_workers seems not to be the cause.
I make AMI and use it for aws auto scaling. In original instance, it does work well, but not new instance.
It seems to take a time to create WhisperModel instance. Is there any issue using /.cache/huggingface?
[2023-06-01 17:18:00,382] INFO in app: of3_iV5q9Lo.webm, device: cuda
[2023-06-01 17:18:00,383] DEBUG in connectionpool: Starting new HTTPS connection (1): huggingface.co:443
[2023-06-01 17:18:00,623] DEBUG in connectionpool: https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-medium/revision/main HTTP/1.1" 200 1804
[2023-06-01 17:18:00,638] DEBUG in connectionpool: https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-medium/revision/main HTTP/1.1" 200 1804
[2023-06-01 17:24:06,260] WARNING in task: Task queue depth is 1
[2023-06-01 17:24:06,261] INFO in app: whisper model created!
[2023-06-01 17:24:06,261] INFO in app: whisper model created!
[2023-06-01 17:24:11,395] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:24:11,407] INFO in transcribe: Processing audio with duration 16:09.828
After clear huggingface cache, it takes a time after 'Processing audio' log.
[2023-06-01 17:49:36,349] INFO in app: whisper model created!
[2023-06-01 17:49:41,269] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:49:41,270] INFO in transcribe: Processing audio with duration 16:09.828
[2023-06-01 17:49:57,985] WARNING in task: Task queue depth is 3
[2023-06-01 17:49:58,134] WARNING in task: Task queue depth is 4
[2023-06-01 17:50:27,988] WARNING in task: Task queue depth is 5
[2023-06-01 17:50:28,154] WARNING in task: Task queue depth is 6
[2023-06-01 17:50:57,992] WARNING in task: Task queue depth is 7
[2023-06-01 17:50:58,161] WARNING in task: Task queue depth is 8
[2023-06-01 17:51:27,995] WARNING in task: Task queue depth is 9
[2023-06-01 17:51:28,183] WARNING in task: Task queue depth is 10
[2023-06-01 17:51:57,998] WARNING in task: Task queue depth is 11
[2023-06-01 17:51:58,194] WARNING in task: Task queue depth is 12
[2023-06-01 17:52:28,002] WARNING in task: Task queue depth is 13
[2023-06-01 17:52:28,220] WARNING in task: Task queue depth is 14
[2023-06-01 17:52:58,006] WARNING in task: Task queue depth is 15
[2023-06-01 17:52:58,245] WARNING in task: Task queue depth is 16
[2023-06-01 17:53:09,643] INFO in transcribe: Detected language 'ko' with probability 1.00
[2023-06-01 17:53:09,644] INFO in transcribe: Detected language 'ko' with probability 1.00
In original instance, it does work well, but not new instance.
What's the difference between the 2 instances?
They are same.
I have met the same problem, when i use multiprocess, and every process will init a model
If call two model.transcribe simultaneously, it takes 5 mins to see 'Processing audio with duration ...' log. If call one, it takes 2~3s.