HuiZhang0812 / DiffusionAD

148 stars 16 forks source link

Excessive CUDA Memory Usage in UNet Code of Recon Module During Forward Function Process #16

Closed johnbuzz98 closed 10 months ago

johnbuzz98 commented 10 months ago

https://github.com/HuiZhang0812/DiffusionAD/blob/80853e65cbe92677839cd093596d287e31f5e723/models/Recon_subnetwork.py#L398C13-L398C38

Dear @HuiZhang0812

I am encountering a significant issue with the UNet code within the Recon Module, specifically during the process of the forward function. The problem arises when updating the hidden embeddings, leading to an excessive consumption of CUDA memory.

While validating the code with a batch size of 16, I noticed that the memory in the VRAM accumulates continuously in this particular section. This issue results in the exhaustion of all 48GB of VRAM on my RTX A6000 GPU before a single model run is completed, leading to a CUDA Out Of Memory (OOM) error.

I would greatly appreciate your assistance in investigating and addressing this matter. Your prompt attention to this issue would be highly valued.

Thank you for your time and support.

Best regards, Woojun Lee

HuiZhang0812 commented 10 months ago

​The GPU used in my experiments is the A100 with 80GB RAM. Perhaps you could consider reducing the batch size.​

johnbuzz98 commented 3 months ago

@HuiZhang0812 Hello,

I’ve been trying to train my model in the same environment (A100 with 80GB VRAM) using the same experimental settings. However, I’m encountering an out-of-memory error again.

train.py#L117 Even when running up to line 117, it consumes 80,657 MiB / 81,920 MiB of VRAM.

My Python and environment settings are identical to those in requirements.txt, and my GPU driver settings are as follows:

| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2

Could you please check if the experimental settings uploaded in args1.json are suitable for running on an A100 with 80GB VRAM?

Thank you.

CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.15 GiB total capacity; 78.17 GiB already allocated; 149.25 MiB free; 78.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF