feat(usability): Refine model inject helper to support huggingface models

Motivation

Improve usability:

Add pack_sample_into_one=True mode for streaming dataloader. (Completed)
- For the purpose of long sequence training verification
Refine model_inject_helper to support more general modeling files (such like huggingface) without even one line of code change. (Completed)
- For easier adaption for third-party models, to demonstrate InternEvo usability
- model_inject_helper support pipeline parallel mode
Refine and optimize the implementation of MockedDataset. (Completed)
- added sanity check to ensure global_bsz and data equivalence between loaded/saved data

Examples:

[x] Huggingface InternLM1-7B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)
[x] Huggingface InternLM2-7B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)
[x] Huggingface Yi-6B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)
[x] Huggingface LLaMA2-7B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)
[x] Huggingface Baichuan2-7B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)
[x] Huggingface Qwen2-7B
- [x] pure dp mode without code change
- [x] pack and isp mode with simple lines of code change (via patch)

internlm/train/pipeline.py

None

None

Before PR:

[x] Pre-commit or other linting tools are used to fix the potential lint issues.
[x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
[x] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
[x] The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

[x] If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
[x] CLA has been signed and all committers have signed the CLA in this PR.