Closed meetviolet closed 4 months ago
Hello, it is for single GPU.
Thx for your quick reply.
Is there any way to reduce single GPU memory usage for fully-vectorized impl? Since our server is 16GB per GPU and 8 GPUs per node, can RankED be trained in DDP-style on our single node?
You can try with smaller input size (160x160 instead of 320x320) or smaller model ( swin-small/tiny instead of swin-base) , but this leads to performance drop.
I see, thx.
Thx for your great work.
In the paper, you said " fully-vectorized implementation requires huge GPU memory – at least 45 GB for 320×320 input resolution with the base model of Swin-Transformer".
I'd like to know the 45 GB memory usage is whether for single GPU or for multiple GPUs in total?
Thx anyway.