bes-dev / tts-runpod-serverless-worker

Serverless implementation of Text-To-Speech
Apache License 2.0
2 stars 0 forks source link

CUDA memory cache increasing #3

Open nepoyasnit opened 1 month ago

nepoyasnit commented 1 month ago

I've run your pipeline from repo and saw that CUDA memory grows rapidly from ~3 Gbs to ~12Gbs. If i put small audio, it is also increasing, but not so much.

CUDA memory when i put small (1 second) audio file: image

CUDA memory after putting long audio file (2.5 minutes): image

Also, I've checked torch.cuda.memory_allocated() and it was constant, but torch.cuda.memory_cached() was increasing. Maybe you can explain me, why CUDA cache is growing?

bes-dev commented 1 month ago

We don't strict audio sample for the fixed length, so than longer sample than more memory will be allocated for pre-processing stage.

nepoyasnit commented 1 month ago

We don't strict audio sample for the fixed length, so than longer sample than more memory will be allocated for pre-processing stage.

Yes, but it seems to me that this memory should be cleared. It is strange that after processing we whatever have a lot of CUDA cache:( Maybe you have any ideas how to clean it? Also, when i use torch.cuda.empty_cache(), the inference time is increasing

bes-dev commented 1 month ago

Memory allocation is slow operation. Memory layout pre-allocation is an optimization trick to make inference faster. When you call empty_cache() you destroy this optimization.

nepoyasnit commented 1 month ago

but it is increasing without any limit, and also it is strange, when we have ~9 Gb CUDA cache with only 3 allocated

nepoyasnit commented 1 month ago

in my case, with requests CUDA cache is increasing to 17Gbs, while 3Gbs allocated