Open AntreasAntoniou opened 3 years ago
Replace nn.DataParallel, with nn.parallel.DistributedDataParallel, because it offers better performance on single machine settings, and allows for easy extensions to multi machine settings. Training dem transformers might require that..
Replace nn.DataParallel, with nn.parallel.DistributedDataParallel, because it offers better performance on single machine settings, and allows for easy extensions to multi machine settings. Training dem transformers might require that..
57