Open Liangyz2019 opened 2 months ago
Suppose I have three datasets and convert to binary files train1.bin, train1.idx, train2.bin, train2.idx, train3.bin, train3.idx.
During training, I want these three data sets to be merged into one and trained together. How should I do this?
It would be better if there are clear examples and guidance, thank you very much.
You can use blend_per_split in the BlendedMegatronDatasetConfig.
blend_per_split
BlendedMegatronDatasetConfig
Marking as stale. No activity in 60 days.
Suppose I have three datasets and convert to binary files train1.bin, train1.idx, train2.bin, train2.idx, train3.bin, train3.idx.
During training, I want these three data sets to be merged into one and trained together. How should I do this?
It would be better if there are clear examples and guidance, thank you very much.