Closed amitagh closed 6 months ago
+1
Update:
downgrading datasets to 2.15.0
seems to work for me.
+1
See my change in #1548, you can wrap prepared_ds_path
in dataset.save_to_disk(prepared_ds_path)
with str()
in the src/axolotl/utils/data/sft.py
file, then you don't need to downgrade any packages.
2.15.0
It worked for me as well!
Update:
downgrading datasets to
2.15.0
seems to work for me.
Work for me. :)
Please check that this issue hasn't been reported before.
Expected Behavior
Preprocess with debug flag should work. python -m axolotl.cli.preprocess /content/test_axolotl.yaml --debug
Current behaviour
Gives error. Have json file with each example in the json file is with {"text":}.
I am doing Pretraining with Lora for Non-Eng lang.
[2024-04-19 09:05:02,918] [DEBUG] [axolotl.log:61] [PID:2346] [RANK:0] max_input_len: 600 Dropping Long Sequences (num_proc=2): 100% 17/17 [00:00<00:00, 99.19 examples/s] Add position_id column (Sample Packing) (num_proc=2): 100% 17/17 [00:00<00:00, 70.88 examples/s] [2024-04-19 09:05:03,502] [INFO] [axolotl.load_tokenized_prepared_datasets:423] [PID:2346] [RANK:0] Saving merged prepared dataset to disk... /content/d538aae6e42c7df428d20d3ff2685ad0 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 70, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 60, in do_cli
load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/content/src/axolotl/src/axolotl/cli/init.py", line 397, in load_datasets
train_dataset, eval_dataset, total_num_steps, prompters = prepare_dataset(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 66, in prepare_dataset
train_dataset, eval_dataset, prompters = load_prepare_datasets(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 460, in load_prepare_datasets
dataset, prompters = load_tokenized_prepared_datasets(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 424, in load_tokenized_prepared_datasets
dataset.save_to_disk(prepared_ds_path)
File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 1515, in save_todisk
fs, = url_to_fs(dataset_path, (storage_options or {}))
File "/usr/local/lib/python3.10/dist-packages/fsspec/core.py", line 363, in url_to_fs
chain = _un_chain(url, kwargs)
File "/usr/local/lib/python3.10/dist-packages/fsspec/core.py", line 316, in _un_chain
if "::" in path
TypeError: argument of type 'PosixPath' is not iterable
Steps to reproduce
Use json with with each example in the json file is with {"text":}.
Preprocess with debug flag. python -m axolotl.cli.preprocess /content/test_axolotl.yaml --debug But i get the error.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
Latest
Acknowledgements