need howmany gpu memory?

Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.

93 stars 15 forks source link

need howmany gpu memory? #2

Closed foxyear-kyumin closed 8 months ago

foxyear-kyumin commented 10 months ago

rch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 684.00 MiB (GPU 0; 24.00 GiB total capacity; 20.99 GiB already allocated; 0 bytes free; 21.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Elsaam2y commented 10 months ago

I am run my tests on T4 and k8 gpu (16 and 8 GB memory).

This errors depends on your GPU and it is mainly happening since the audio file you are testing might be long enough and hence the torch model is running OOM. The longer the audio file, the more gpu memory utilization is required. To fix this error, I have modified processing the audio file into smaller chunks of 10 secs. This would prevent running into OOM on your GPU. Please pull the latest commits and test again.

Let me know if you still have any other problems.

foxyear-kyumin commented 10 months ago

Traceback (most recent call last): File "D:\APP\DINet_optimized-main\train_DINet_clip.py", line 198, in "===> Epoch{}: Loss_DI: {:.4f} Loss_GI: {:.4f} Loss_DV: {:.4f} Loss_GV: {:.4f} Loss_perception: {:.4f} Loss_sync: {:.4f} lr_g = {:.7f} elapsed_time: {:.4f}".format( IndexError: Replacement index 10 out of range for positional args tuple "elapsed_time": {:.4f} is need?

foxyear-kyumin commented 10 months ago

Good job!Thanks Elsaam2y. I want know how to train syncnet.pt,can give me any suggestion?

Elsaam2y commented 10 months ago

For the syncnet I tried to get some inspirations and follow the syncnet implementation for wav2lip. However the final results were not always consistent and thats why I thought about using the original trained DINet model but with a trained and lightweight audio mapping model.