SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.71k stars 975 forks source link

long files are leading to memory crashes (on gpu) #550

Closed westhecool closed 11 months ago

westhecool commented 11 months ago

I am trying to transcribe long files. When it gets to about an hour or so, there's about a 50% chance that it will crash with the error CUDA failed with error out of memory. I have a Nvidia RTX 3070 which has eight gigabytes of VRAM which should be sufficient, right? When I am transcribing, VRAM usage stays below 65%. It is very odd that it says that it is running out. I have tried both int8 and float16 but it doesn't seem to change anything. I'm using the large-v2 model. (Sorry if something like this already exists I couldn't find anything.)

Update: It is very random, it appears that the time does not matter. Sometimes it'll transcribe a two-hour file completely fine and other times it will fail on a 20-minute file.

Purfview commented 11 months ago

Try best_of=1.

blackpolarz commented 11 months ago

The fact that the error CUDA failed with error out of memory happens randomly makes me suspect there might be some other programs running on the background that happen to use the gpu. There are too many possibilities that can lead to that. What I can propose is perhaps splitting the audio manually into smaller files so that each time it crashes you don't lose as much? Alternatively, perhaps you can attempt to do the transcription with a smaller model? Note that VRAM usage do spikes at times.

westhecool commented 11 months ago

Try best_of=1.

Thanks for the suggestion but it didn't seem to help.

westhecool commented 11 months ago

The fact that the error CUDA failed with error out of memory happens randomly makes me suspect there might be some other programs running on the background that happen to use the gpu. There are too many possibilities that can lead to that. What I can propose is perhaps splitting the audio manually into smaller files so that each time it crashes you don't lose as much? Alternatively, perhaps you can attempt to do the transcription with a smaller model? Note that VRAM usage do spikes at times.

I have thought of that and I've tried closing all other apps on my PC to no avail. The spike seems to be undetectable by task manager so I have no idea what could be causing it.

Purfview commented 11 months ago

Where and how do you run it?

westhecool commented 11 months ago

Where and how do you run it?

Well, currently I'm using WSL which has GPU pass-through. Which probably isn't the best, but I've used it before without any problems.

Purfview commented 11 months ago

You could say that from the start... Try to update wsl to latest, run this in console: wsl --update --pre-release

westhecool commented 11 months ago

You could say that from the start... Try to update wsl to latest, run this in console: wsl --update --pre-release

That actually worked, thanks. Such as silly issue though, probably should have checked before I made a report.