Closed ngobibibnbe closed 3 years ago
Hi, we haven't tried training on multiple servers. 🤗 Transformers have resources about this. I'd suggest checking them.
Finally, i trained it on multiple machines within an Hadoop cluster using Horovod to distribute the training process with the allreduce strategy and store the dataset on HDFS. Thank you for your answer.
I have seen that your training implementation of FinBERT supports distributed training. But how can we launch it on multiple servers ?