Multi-GPU VarNet training and batch size

facebookresearch / fastMRI

A large-scale dataset of both raw MRI measurements and clinical MRI images.

https://fastmri.org

MIT License

1.34k stars 373 forks source link

Multi-GPU VarNet training and batch size #55

Closed z-fabian closed 4 years ago

z-fabian commented 4 years ago

My understanding is that the current VarNet training code uses batch size of 1 per GPU. Therefore in the multi-GPU training scenario the effective batch size would be num_gpus * batch_size = num_gpus as the gradients are averaged between GPUs after the backward pass.

According to the paper (and what I can also see in the code) the learning rate is set to 0.0003, but there is no mention of the (effective) batch size used in the experiment. Since the learning rate typically has to be adjusted to the batch size it would be good to know what batch size was used in the experiments (that is how many GPUs used). I expect changes in final SSIM on validation/test set with varying number of GPUs if the learning rate is kept at 0.0003.

mmuckley commented 4 years ago

This is a good point. I believe 32 GPUs were used.

z-fabian commented 4 years ago

Thank you for the info. Have you experimented with how to adjust the learning rate with different batch sizes, or how much impact it has on reconstruction quality? Intuitively, I would use a factor of N lower learning rate for a factor of N less GPUs (smaller batch size), but this would require some trial-and-error to figure out. I'm interested because I want to reproduce the results from the paper, but don't have access to 32 GPUs. Thanks again.

mmuckley commented 4 years ago

I don't have much experience myself. I've gotten up to about 0.914 SSIM with two GPUs and the same learning rate and a smaller model for testing the repository refactor. Your strategy for lowering the learning rate by the GPU factor seems like a reasonable one.

adefazio commented 4 years ago

Keeping the learning rate the same would be my recommendation, but you should consider reducing the number of layers, otherwise it's not going to be train in a reasonable amount of time.

z-fabian commented 4 years ago

Okay, thanks for the input. I am going to experiment with model size and if the number of GPUs have an effect on SSIM. I'm closing this issue now.

zhan4817 commented 4 years ago

However, when I tried to train VarNet with batch_size = 2, it shows the error in the validation sanity check:

... 237, in get_low_frequency_lines while mask[..., r, :]: RuntimeError: bool value of Tensor with more than one value is ambiguous

Do you have any idea? Thanks!

z-fabian commented 4 years ago

VarNet only works with batch size 1, that is per GPU. This is because the slices have varying size. If you want to increase mini-batch size, that is number of training examples averaged per gradient update just increase the number of GPUs used for training.