THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.47k stars 5.19k forks source link

AssertionError: `validation_file` should be a csv or a json file. #1438

Open AkKari808 opened 9 months ago

AkKari808 commented 9 months ago

Is there an existing issue for this?

Current Behavior

When I was testing ptuning, I typed the following at the command line:

python main.py --do_predict=yes --validation_file AdvertiseGen/dev.json    --test_file AdvertiseGen/dev.json     --overwrite_cache=yes     --prompt_column content   --response_column summary     --model_name_or_path .. \chatglm6b     --ptuning_checkpoint ./output/adgen-chatglm-6b-pt-128/checkpoint-3000     --output_dir ./output/adgen-chatglm-6b-pt-128     --overwrite_output_dir=yes     --max_source_length 64     --max_target_length 64     --per_device_eval_batch_size 2     --predict_with_generate=yes     --pre_seq_len 128     --quantization_bit 4

The error is as follows:

Traceback (most recent call last):

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\main.py", line 430, in< module>

main()

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\main.py", line 57, in main

model_args, data_args, training_args = parser.parse_args_into_dataclasses()

File "E:\Anaconda\envs\torch_v1\lib\site-packages\transformers\hf_argparser.py", line 332, in parse_args_into_dataclasses

obj = dtype(**inputs)

File "< string>" , line 25, in init

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\arguments.py", line 221, in __post_init__

assert extension in ["csv", "json"], "validation_file should be a csv or a json file."

AssertionError: validation_file should be a csv or a json file.

60eb87089828e3a842d804a3b173fde 70aaab419df280347518840ab28a39d

Expected Behavior

No response

Steps To Reproduce

When I was testing ptuning, I typed the following at the command line:

python main.py --do_predict=yes --validation_file AdvertiseGen/dev.json    --test_file AdvertiseGen/dev.json     --overwrite_cache=yes     --prompt_column content   --response_column summary     --model_name_or_path .. \chatglm6b     --ptuning_checkpoint ./output/adgen-chatglm-6b-pt-128/checkpoint-3000     --output_dir ./output/adgen-chatglm-6b-pt-128     --overwrite_output_dir=yes     --max_source_length 64     --max_target_length 64     --per_device_eval_batch_size 2     --predict_with_generate=yes     --pre_seq_len 128     --quantization_bit 4

The error is as follows:

Traceback (most recent call last):

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\main.py", line 430, in< module>

main()

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\main.py", line 57, in main

model_args, data_args, training_args = parser.parse_args_into_dataclasses()

File "E:\Anaconda\envs\torch_v1\lib\site-packages\transformers\hf_argparser.py", line 332, in parse_args_into_dataclasses

obj = dtype(**inputs)

File "< string>" , line 25, in init

File "D:\ Fall 2023 Semester \ Frontiers in Natural Language Processing \ChatGLM-6B-main\ChatGLM-6B-main\ptuning\arguments.py", line 221, in __post_init__

assert extension in ["csv", "json"], "validation_file should be a csv or a json file."

AssertionError: validation_file should be a csv or a json file.

Environment

- OS:Windows
- Python:3.9.16
- Transformers:4.27.1
- PyTorch:2.02+cuda118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response