Training gpu consumption vs inference gpu consumption

Hi,

I have been using your work and I have been getting very impressive results, so first of all thank you for sharing it with the community!

I would like to train this is in a lower grade GPU such as RTX3070 which has 8gb RAM but the training right now consumes minimum 10gb while the inference model does work easily on my GPU. Are there any optimization strategies to be used for training that would be advantageous for lower grade GPUs? For example, have you tested freezing the decoder for training and how well the performance of it dropped?

Visual-Attention-Network / SegNeXt

Training gpu consumption vs inference gpu consumption #57