Description of changes: This PR relaxes torch and transformers versions to allow for older versions that were used during original training. This is needed in light of recent torch/transformers versions being slower with DDP.
Relevant issues (but the problem may be deeper than these):
Description of changes: This PR relaxes
torch
andtransformers
versions to allow for older versions that were used during original training. This is needed in light of recenttorch
/transformers
versions being slower with DDP.Relevant issues (but the problem may be deeper than these):
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.