Closed wangwang110 closed 4 years ago
Hi, the best model was trained on a single 32GB GPU. I tried developing multi-gpu training, but PyTorch is not geared towards that.
The paper "Rethinking Self-Attention: Towards Interpretability in Neural Parsing" (Mrini et al., 2020) mentions:
Each English experiment is performed on a single 32GB GPU, while each Chinese experiment is performed on a single 12GB GPU.
Why does the GPU memory requirement differ between the Chinese experiment and English experiment?
how much cuda memeory the model needs when training ? and does it support multi-gpu training ?