Closed michalblaha closed 7 months ago
@aleksandr-smechov Not in my case. I have many audio files with length more than 2 hours (usually 5-10 hours). It often ends with Out of Memory on GPU with 24 GB RAM (large-v2 model).
I tried to reduce the load on the GPU RAM by moving the diarization to another docker instance with CPU only (24 CPU, 40GB RAM) and set up remote diarization. The diarization of the 2+ hours audio ended with a timeout
part of the log (from primary docker instance)
23-11-12 23:00:09,694 - faster_whisper - INFO - Processing audio with duration 02:12:51.248
2023-11-12 23:17:02.252 | DEBUG | wordcab_transcribe.logging:time_and_tell_async:137 - transcription executed in 1012.8322 secs
2023-11-12 23:17:02.262 | ERROR | wordcab_transcribe.router.v1.audio_url_endpoint:inference_with_audio_url:99 - Error in diarization:
Traceback (most recent call last):
File "/app/src/wordcab_transcribe/services/asr_service.py", line 654, in process_diarization
out = await time_and_tell_async(
File "/app/src/wordcab_transcribe/logging.py", line 124, in time_and_tell_async
result = await func
File "/app/src/wordcab_transcribe/services/asr_service.py", line 780, in remote_diarization
async with session.post(
File "/root/.local/share/hatch/env/virtual/wordcab-transcribe/9TtSrW0h/runtime/lib/python3.10/site-packages/aiohttp/client.py", line 1167, in __aenter__
self._resp = await self._coro
File "/root/.local/share/hatch/env/virtual/wordcab-transcribe/9TtSrW0h/runtime/lib/python3.10/site-packages/aiohttp/client.py", line 586, in _request
await resp.start(conn)
File "/root/.local/share/hatch/env/virtual/wordcab-transcribe/9TtSrW0h/runtime/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 900, in start
with self._timer:
File "/root/.local/share/hatch/env/virtual/wordcab-transcribe/9TtSrW0h/runtime/lib/python3.10/site-packages/aiohttp/helpers.py", line 725, in __exit__
raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
2023-11-12 23:17:02.263 | INFO | wordcab_transcribe.logging:dispatch:75 - Task [7849928e-fc18-4cda-af19-62a2915ea1df] | Status: 500, Time: 1048.8975 secs
INFO: 10.10.100.1:55847 - "POST /api/v1/audio-url?url=https%3A%2F%2Fsomedata.hlidacstatu.cz%2Fmp3%2Fvyjadreni-politiku%2Fe0d47d12c2a88dd910d724b79309af50.mp3 HTTP/1.1" 500 Internal Server Error
and about requested PR change - I agree with you - it would be better solution. Unfortunately my knowledge of Python is very low and I'm not able to make such bug change. sorry
to support several hours long audio files (and long running remote diarization proces)