ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

Model performance degrades when moved to Multi-GPU #29

Open ereday opened 5 years ago

ereday commented 5 years ago

Hi,

When I run your code on multi-gpu, performance degrades severely (compared to the single-gpu version). To make the code multi-gpu competable, I've only added 2 lines of code:

These are the results I got with these two settings:

Although avg loss values are similar, there are big differences in other metrics.

ThilinaRajapakse commented 5 years ago

Those changes should be sufficient to enable multi-gpu training in my experience. Is there any other difference (e.g. batch size) between the two runs?

ereday commented 5 years ago

Nope, I did not change any of the variables in args dictionary.

ThilinaRajapakse commented 5 years ago

This is probably a silly question, but did you try this multiple times and receive the same results?

ereday commented 5 years ago

Yes, I run the code with the same configuration multiples times. There is no difference across different runs.

ThilinaRajapakse commented 5 years ago

Sorry, I am not sure why this is happening. I recommend that you try the Simple Transformers library as it supports multi-gpu training by default and I have used multi-gpu training with that library without any performance degradation.