InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
https://xtuner.readthedocs.io/zh-cn/latest/
Apache License 2.0
3.64k stars 297 forks source link

xtuner check-custom-dataset /home/internlm2.py不通过怎么办? #469

Open SaltedFishBot opened 5 months ago

SaltedFishBot commented 5 months ago

`Generating train split: 0 examples [00:00, ? examples/s]Failed to read file '/home/ssc/other/train_data3.json' with error <class 'pyarrow.lib.ArrowInvalid'>: Could not convert '有效进行现金管理可以反映公司治理具备风险管理的能力,同时保证公司运营的流动性。另外,投资安全性高、流动性好、满足保本要求的投资产品,目的是对资本进行利润最大化,同时保证资产的保值和增值。' with type str: tried to convert to double Generating train split: 0 examples [00:00, ? examples/s] Traceback (most recent call last): File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/builder.py", line 1973, in _prepare_splitsingle for , table in generator: File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 164, in _generate_tables raise ValueError(f"Not able to read records in the JSON file at {file}.") from None ValueError: Not able to read records in the JSON file at /home/ssc/other/train_data3.json.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 307, in main() File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 303, in main runner.train() File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1728, in train self._train_loop = self.build_train_loop( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1520, in build_train_loop loop = LOOPS.build( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/runner/loops.py", line 32, in init dataloader = runner.build_dataloader( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1370, in build_dataloader dataset = DATASETS.build(dataset_cfg) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 225, in process_hf_dataset return process(*args, *kwargs) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 167, in process dataset = build_origin_dataset(dataset, split) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 30, in build_origin_dataset dataset = BUILDER.build(dataset) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/load.py", line 2582, in load_dataset builder_instance.download_and_prepare( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare self._download_and_prepare( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/builder.py", line 1100, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/builder.py", line 1860, in _prepare_split for job_id, done, content in self._prepare_split_single( File "/home/xxx/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/datasets/builder.py", line 2016, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset`

请问这是什么原因导致的?

HIT-cwh commented 5 months ago

你好,方便提供下你的config文件以及数据格式示例吗?

2001926342 commented 1 month ago

datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset 我也是这个问题,请问怎么解决的呀