bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.32k stars 213 forks source link

Fix softmax #259

Closed thomasw21 closed 2 years ago

thomasw21 commented 2 years ago

@hyunwoongko has provided a fix for gpt-neox https://github.com/EleutherAI/gpt-neox/pull/572. I think the same fix applies to us. I think this might have affect the throughput in some ways.

@hyunwoongko feel free to correct my if i'm wrong.

stas00 commented 2 years ago

Thank you for detecting that we missed that upstream fix, @thomasw21.

Since I originally applied the first batch of these fixes, I tried to see what else was missing and discovered a lot more of those and I'm trying a different approach here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/260

thomasw21 commented 2 years ago

Thanks @stas00 I think we can merge these fixes on your branch if your prefer or master. (I think they are relevant and not covered by your PR)

stas00 commented 2 years ago

Yes, and there are more fixes in this file besides what you added - I'm checking what else might need to go in there and updating my PR.

I'm going backwards - taking the Meg-LM version as master and checking if we made any changes to it and syncing those changes if need be.

thomasw21 commented 2 years ago

Closing in favor of https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/260