Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
353 stars 28 forks source link

Batch size vs learning rate #58

Closed d33dler closed 10 months ago

d33dler commented 10 months ago

Greetings, Based on your experience , would setting the batch size to 5 and learning rate to 0.005 be appropriate? I use 4 GPUs. Thank you!

Haiyang-W commented 10 months ago

batch_size = 5 for each GPU card? I recommend you set the lr to 0.003, because the lr is the global learning rate based on the 8 gpus. We use total_batch_size = 24 and lr=0.003. Notably, please turn on the sync_bn.