InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
https://internevo.readthedocs.io/zh-cn/latest/?badge=latest
Apache License 2.0
310 stars 52 forks source link

fix(910B): fix bugs in 910B for varlen and fixlen FA #309

Closed li126com closed 2 months ago

li126com commented 2 months ago

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

issue 307

Modification

  1. rotary_embedding为确保正确性采用gpu torch的版本。
  2. 修改了unpack data情况下seq_len的计算逻辑。
  3. 判定并修正了910B上all_reduce的op操作,由AVG转为SUM。

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

After PR:

sunpengsdu commented 2 months ago

补充PR的description

li126com commented 2 months ago

add description

done