Hi, I’m encountering a similar CUDA out-of-memory (OOM) issue as described in #10 while running on TITAN XP GPUs to run the pascal voc experiment with 92 labels. It occurred during the ASPP stage. To troubleshoot, I reduced the batch size to 1 and scaled down the decoder channels. Specifically, I made the following adjustments to the deocoder header:
channels: 128 -> 16
text_channels: 128 -> 16
up_channels: (64, 32) -> (8, 4)
skip_channels: (32, 16) -> (4, 2)
Hardcoded the GroupNorm layer size in the decoder to 1
These changes allowed me to pass the ASPP stage, but it OOMed again during the upscaling step.
Due to my limited VRAM, I would like to try out pre-calculating pseudo-labels for the CLIP guidance loss as suggested in the other thread. My question is how to do it? I’ve identified the forward_maskclip function in model/vlm.py as a potential candidate, but it appears to process weakly augmented images, which vary per iteration. I’m unsure how best to handle this variability when pre-calculating the labels.
Hi, I’m encountering a similar CUDA out-of-memory (OOM) issue as described in #10 while running on TITAN XP GPUs to run the pascal voc experiment with 92 labels. It occurred during the ASPP stage. To troubleshoot, I reduced the batch size to 1 and scaled down the decoder channels. Specifically, I made the following adjustments to the deocoder header:
Due to my limited VRAM, I would like to try out pre-calculating pseudo-labels for the CLIP guidance loss as suggested in the other thread. My question is how to do it? I’ve identified the
forward_maskclip
function inmodel/vlm.py
as a potential candidate, but it appears to process weakly augmented images, which vary per iteration. I’m unsure how best to handle this variability when pre-calculating the labels.Thanks for your help!