The triton fast_kernels are broken in transformers v4.46 due to the changes and the way the patching is done in fms-acceleration described here. In addition, the gradient_accumulation is deteriorated in v4.46 which is fixed in transformers PR which has been merged but not released yet.
In addition, upgrading transformers broke unit tests and is being resolved/discussion in #383.
Thus we will set transformers below v4.46 while we wait for these fixes to go in.
Related issue number
How to verify the PR
Was the PR tested
[ ] I have added >=1 unit test(s) for every new method I have added.
Description of the change
The triton fast_kernels are broken in transformers v4.46 due to the changes and the way the patching is done in fms-acceleration described here. In addition, the gradient_accumulation is deteriorated in v4.46 which is fixed in transformers PR which has been merged but not released yet.
In addition, upgrading transformers broke unit tests and is being resolved/discussion in #383.
Thus we will set transformers below v4.46 while we wait for these fixes to go in.
Related issue number
How to verify the PR
Was the PR tested