InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
285
stars
47
forks
source link
feat(dataloader): Implement megatron dataloader and mocked dataloader #323
Closed
zigzagcai closed 2 weeks ago
This PR's main functionality is okay and runnable, but still need some refinement.
Motivation
Support
megatron
dataloader type: When users want to use InternEvo framework to train over megatron tokenized datasets.Support
mocked
dataloader type: When users want to conduct precision alignment experiment to ensure that the loaded data is completely consistent.Modification
internlm/data/megatron/*
internlm/data/mocked/*
BC-breaking (Optional)
None
Use cases (Optional)
None
Checklist
Before PR:
After PR: