SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

Details of Training #8

Closed achen46 closed 2 years ago

achen46 commented 2 years ago

Hi @alihassanijr , thanks for the great repository. For reproducing your results, how many nodes were used to train these models ? I see that config files are provided for each model, but wonder if any changes are needed if trained on multi-node.

alihassanijr commented 2 years ago

Hi and thank you for your interest. We tried both single and multi node settings, but did not notice any significant difference between the two. If you want to train on multiple nodes, you'd have to divide the batch size to keep it consistent. All of the models we've released were trained with a batch size of 1024, which is 128 samples per GPU, hence the 128 in the config files. If you increase the number of GPUs, you'd have to decrease batch size (i.e. 2 nodes with 16 GPUs -> batch size 64).

I hope this clarifies things.

achen46 commented 2 years ago

Thanks for clarification. I will try to reproduce your numbers. Great work !