Hi, I would like to train GENRE from BART on page-level document retrieval with only DPR data (GENRE only DPR data). I noticed that in the appendix, the training was done on 128 GPUs. However, when I try training on 8 32GB V100 GPUs with scripts_genre/train.sh, it only uses about 10.3GB of the 32GB. I wonder if I missed something. Also, will there be a difference in the batch size if I reduce the number of GPUs? Thanks :)
Hi, I would like to train GENRE from BART on page-level document retrieval with only DPR data (GENRE only DPR data). I noticed that in the appendix, the training was done on 128 GPUs. However, when I try training on 8 32GB V100 GPUs with
scripts_genre/train.sh
, it only uses about 10.3GB of the 32GB. I wonder if I missed something. Also, will there be a difference in the batch size if I reduce the number of GPUs? Thanks :)