Open seanliu96 opened 2 weeks ago
Describe the bug
An exception is raised when trying to use model.data.train_ds.file_names to provide multiple data files in SFT and DPO.
model.data.train_ds.file_names
Steps/Code to reproduce bug
When setting the config model.data.train_ds.file_names with multiple training files rather than using file_path, an exception is raised because https://github.com/NVIDIA/NeMo-Aligner/blob/main/nemo_aligner/data/nlp/builders.py#L267 only consider file_path and assume it is not None.
file_path
None
Expected behavior
The build_sft_dataset and other similar functions should detect whether cfg.file_names is specified and then build datasets.
build_sft_dataset
cfg.file_names
Sorry for the wrong label. It should be a feature.
Describe the bug
An exception is raised when trying to use
model.data.train_ds.file_names
to provide multiple data files in SFT and DPO.Steps/Code to reproduce bug
When setting the config
model.data.train_ds.file_names
with multiple training files rather than usingfile_path
, an exception is raised because https://github.com/NVIDIA/NeMo-Aligner/blob/main/nemo_aligner/data/nlp/builders.py#L267 only considerfile_path
and assume it is notNone
.Expected behavior
The
build_sft_dataset
and other similar functions should detect whethercfg.file_names
is specified and then build datasets.