Describe the bug
When I configure datasets for a training task using train-data-path, valid-data-path, and test-data-path, running the training task results in an error. The error message is shown in the screenshot below:
File "/home/kas/kas_workspace/dataset/zrh/pai-megatron-patch/Pai-Megatron-Patch/Megatron-LM-240405/megatron/core/datasets/blended_megatron_dataset_config.py", line 72, in __post_init__
assert self.split is None, "split and blend_per_split are incompatible"
AssertionError assert self.split is None, "split and blend_per_split are incompatible": split and blend_per_split are incompatible
To Reproduce
Configure training datasets using train-data-path, valid-data-path, and test-data-path.
Expected behavior
Enable configuring datasets using train-data-path, valid-data-path, and test-data-path
Stack trace/logs
File "/home/kas/kas_workspace/dataset/zrh/pai-megatron-patch/Pai-Megatron-Patch/Megatron-LM-240405/megatron/core/datasets/blended_megatron_dataset_config.py", line 72, in __post_init__
assert self.split is None, "split and blend_per_split are incompatible"
AssertionError assert self.split is None, "split and blend_per_split are incompatible": split and blend_per_split are incompatible
Environment (please complete the following information):
Describe the bug When I configure datasets for a training task using train-data-path, valid-data-path, and test-data-path, running the training task results in an error. The error message is shown in the screenshot below:
File "/home/kas/kas_workspace/dataset/zrh/pai-megatron-patch/Pai-Megatron-Patch/Megatron-LM-240405/megatron/core/datasets/blended_megatron_dataset_config.py", line 72, in __post_init__ assert self.split is None, "split and blend_per_split are incompatible" AssertionError assert self.split is None, "split and blend_per_split are incompatible": split and blend_per_split are incompatible
To Reproduce Configure training datasets using train-data-path, valid-data-path, and test-data-path.
Expected behavior Enable configuring datasets using train-data-path, valid-data-path, and test-data-path
Stack trace/logs File "/home/kas/kas_workspace/dataset/zrh/pai-megatron-patch/Pai-Megatron-Patch/Megatron-LM-240405/megatron/core/datasets/blended_megatron_dataset_config.py", line 72, in __post_init__ assert self.split is None, "split and blend_per_split are incompatible" AssertionError assert self.split is None, "split and blend_per_split are incompatible": split and blend_per_split are incompatible
Environment (please complete the following information):
Proposed fix https://github.com/NVIDIA/Megatron-LM/pull/840
Additional context Add any other context about the problem here.