Bedrettin-Cetinkaya / RankED

MIT License
18 stars 3 forks source link

About the GPU memory footprint #3

Closed meetviolet closed 4 months ago

meetviolet commented 4 months ago

Thx for your great work.

In the paper, you said " fully-vectorized implementation requires huge GPU memory – at least 45 GB for 320×320 input resolution with the base model of Swin-Transformer".

I'd like to know the 45 GB memory usage is whether for single GPU or for multiple GPUs in total?

Thx anyway.

Bedrettin-Cetinkaya commented 4 months ago

Hello, it is for single GPU.

meetviolet commented 4 months ago

Thx for your quick reply.

Is there any way to reduce single GPU memory usage for fully-vectorized impl? Since our server is 16GB per GPU and 8 GPUs per node, can RankED be trained in DDP-style on our single node?

Bedrettin-Cetinkaya commented 4 months ago

You can try with smaller input size (160x160 instead of 320x320) or smaller model ( swin-small/tiny instead of swin-base) , but this leads to performance drop.

meetviolet commented 4 months ago

I see, thx.