Open SicariusSicariiStuff opened 3 months ago
I am also facing the same issue with the following trace:
Traceback (most recent call last):
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/rishabh/axolotl/src/axolotl/cli/preprocess.py", line 103, in <module>
fire.Fire(do_cli)
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/rishabh/axolotl/src/axolotl/cli/preprocess.py", line 74, in do_cli
load_rl_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/mnt/rishabh/axolotl/src/axolotl/cli/__init__.py", line 445, in load_rl_datasets
train_dataset, eval_dataset = load_prepare_dpo_datasets(cfg)
File "/mnt/rishabh/axolotl/src/axolotl/utils/data/rl.py", line 131, in load_prepare_dpo_datasets
train_dataset = load_split(cfg.datasets, cfg)
File "/mnt/rishabh/axolotl/src/axolotl/utils/data/rl.py", line 110, in load_split
split_datasets[i] = map_dataset(
File "/mnt/rishabh/axolotl/src/axolotl/utils/data/rl.py", line 67, in map_dataset
data_set = data_set.map(
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3156, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3517, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/mnt/rishabh/anaconda3/envs/axolotl1/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/mnt/rishabh/axolotl/src/axolotl/prompt_strategies/orpo/chat_template.py", line 272, in transform_fn
[msg.model_dump() for msg in dataset_parser.get_prompt(sample).messages],
File "/mnt/rishabh/axolotl/src/axolotl/prompt_strategies/orpo/chat_template.py", line 111, in get_prompt
content=prompt["chosen"][i * 2 + 1]["content"],
any update?
Hi @SicariusSicariiStuff thank you for following up! Any chance you can provide us the dataset you are using so we can do deeper testing on it?
Please check that this issue hasn't been reported before.
Expected Behavior
Local dataset to work the same as with loading a dataset from HF hub
Current behaviour
FileNotFoundError: Couldn't find a dataset script at
Steps to reproduce
If you have:
Replace it with the same file locally (parquet\json doesn't matter) And you'll get :
FileNotFoundError: Couldn't find a dataset script at
Config yaml
Possible solution
Using a similar processing logic as in a loaded dataset from the hub
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
latest release
Acknowledgements