Closed HarikrishnanBalagopal closed 3 months ago
https://github.com/foundation-model-stack/fms-hf-tuning/blob/34362ae61f0a03b3505f0a357aceae7a92ff5304/tuning/config/configs.py#L83
https://github.com/foundation-model-stack/fms-hf-tuning/blob/34362ae61f0a03b3505f0a357aceae7a92ff5304/tuning/config/configs.py#L56-L62
ValueError: BuilderConfig JsonConfig(name='default', version=0.0.0, data_dir=None, data_files={'train': ['/data/mydataset/train/train.jsonl']}, description=None, features=None, encoding='utf-8', encoding_errors=None, field=None, use_threads=True, block_size=None, chunksize=10485760, newlines_in_values=None) doesn't have a 'columns' key.
https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset
>>> d1=datasets.load_dataset(path=s1, data_files=s, columns=['input']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/u/haribala/code/my-conda-envs/foo/lib/python3.11/site-packages/datasets/load.py", line 2594, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/u/haribala/code/my-conda-envs/foo/lib/python3.11/site-packages/datasets/load.py", line 2303, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( ^^^^^^^^^^^^ File "/u/haribala/code/my-conda-envs/foo/lib/python3.11/site-packages/datasets/builder.py", line 374, in __init__ self.config, self.config_id = self._create_builder_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/u/haribala/code/my-conda-envs/foo/lib/python3.11/site-packages/datasets/builder.py", line 622, in _create_builder_config raise ValueError(f"BuilderConfig {builder_config} doesn't have a '{key}' key.") ValueError: BuilderConfig JsonConfig(name='default', version=0.0.0, data_dir=None, data_files={'train': ['/data/mydataset/train/train.jsonl']}, description=None, features=None, encoding='utf-8', encoding_errors=None, field=None, use_threads=True, block_size=None, chunksize=10485760, newlines_in_values=None) doesn't have a 'columns' key.
we are handling it gracefully and loading the dataset without config_kwargs when it fails. This way we no need to have type specific handling. we simply fallback when such happens.
https://github.com/foundation-model-stack/fms-hf-tuning/blob/34362ae61f0a03b3505f0a357aceae7a92ff5304/tuning/config/configs.py#L83
https://github.com/foundation-model-stack/fms-hf-tuning/blob/34362ae61f0a03b3505f0a357aceae7a92ff5304/tuning/config/configs.py#L56-L62
https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset
Testing manually