InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
https://internevo.readthedocs.io/zh-cn/latest/?badge=latest
Apache License 2.0
311 stars 52 forks source link

Add z loss to PipelineSchedule #365

Closed zhhsplendid closed 3 weeks ago

zhhsplendid commented 4 weeks ago

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Add z_loss to PipelineScheduler

Modification

Files under internlm: add z_loss to PipelineScheduler Other files: config and mpi scripts which are conflict with my local branch

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

Log that has z_loss: img_v3_02g5_adc32da9-0088-414d-b828-023c5cde7e4g

Log that doesn't have z_loss: img_v3_02g5_f0027a7a-4700-45dd-9ed3-bfec8611013g

Checklist

Before PR:

After PR: