Social-AI-Studio / ContrastiveAA

Official repository for SocialNLP'24 paper "Contrastive Disentanglement for Authorship Attribution"
0 stars 0 forks source link

Issues with run_glue.py script: Unrecognized arguments (do_contrastive_cls) and unable to parse test dataset #1

Open ahmedsohair opened 3 months ago

ahmedsohair commented 3 months ago
  1. When running the script with the --do_contrastive_cls argument, I receive the following error:
Traceback (most recent call last):
  File "ContrastiveAA-main\examples\pytorch\text-classification\run_glue.py", line 626, in <module>
    main()
  File "ContrastiveAA-main\examples\pytorch\text-classification\run_glue.py", line 218, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ahmad\anaconda3\envs\My_envmt\Lib\site-packages\transformers\hf_argparser.py", line 348, in parse_args_into_dataclasses
    raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--do_contrastive_cls']
  1. When running the script without the --do_contrastive_cls argument, I receive the following error:
C:\Users\ahmad\anaconda3\envs\My_envmt\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
[WARNING|modeling_utils.py:4282] 2024-07-31 14:53:17,081 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "ContrastiveAA-main\examples\pytorch\text-classification\run_glue.py", line 626, in <module>
    main()
  File "ContrastiveAA-main\examples\pytorch\text-classification\run_glue.py", line 473, in main
    raise ValueError("--do_predict requires a test dataset")
ValueError: --do_predict requires a test dataset

The Command that I used is as follows:

python examples/pytorch/text-classification/run_glue.py \
    --model_name_or_path bert-base-uncased \
    --do_train \
    --do_eval \
    --num_train_epochs 5 \
    --gradient_accumulation_steps 4 \
    --test_file data/AA_data/AA_cls_test.json \
    --validation_file data/AA_data/AA_cls_validation.json \
    --train_file data/AA_data/AA_cls_train.json \
    --output_dir AA_region_cls/ \
    --overwrite_output_dir \
    --per_device_train_batch_size=128 \
    --per_device_eval_batch_size=32 \
    --save_strategy no \
    --evaluation_strategy epoch

I ensured that the paths to the dataset is correct, the train, test and valid file are all in the same folder. Also, one thing i noticed with the script you posted is that the file being used for validation as well as test is the same??

YujiaHu0819 commented 3 months ago

Hi,

We have checked and updated the command, please try the following:

python examples/pytorch/text-classification/run_glue.py \
    --model_name_or_path bert-base-uncased \
    --do_train \
    --do_predict \
    --num_train_epochs 5 \
    --gradient_accumulation_steps 4 \
    --test_file data/AA_data/AA_cls_test.json \
    --validation_file data/AA_data/AA_cls_val.json \
    --train_file data/AA_data/AA_cls_train.json \
    --output_dir AA_region_cls/ \
    --overwrite_output_dir \
    --per_device_train_batch_size=128 \
    --per_device_eval_batch_size=32 \
    --save_strategy no \
    --evaluation_strategy epoch

If you still get the "ValueError: --do_predict requires a test dataset", please use print(raw_datasets) before the error line to check if it includes train, validation, and test datasets.