Closed thomasw21 closed 2 years ago
Thank you for detecting that we missed that upstream fix, @thomasw21.
Since I originally applied the first batch of these fixes, I tried to see what else was missing and discovered a lot more of those and I'm trying a different approach here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/260
Thanks @stas00 I think we can merge these fixes on your branch if your prefer or master. (I think they are relevant and not covered by your PR)
Yes, and there are more fixes in this file besides what you added - I'm checking what else might need to go in there and updating my PR.
I'm going backwards - taking the Meg-LM version as master and checking if we made any changes to it and syncing those changes if need be.
Closing in favor of https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/260
@hyunwoongko has provided a fix for gpt-neox https://github.com/EleutherAI/gpt-neox/pull/572. I think the same fix applies to us. I think this might have affect the throughput in some ways.
@hyunwoongko feel free to correct my if i'm wrong.