XiangLi1999 / PrefixTuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation
868 stars 158 forks source link

FileNotFoundError for e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold #21

Closed 14H034160212 closed 2 years ago

14H034160212 commented 2 years ago

Hi,

I got the following error which says FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold' when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache

Does anyone meet that issue or know how to deal with that? Thank you so much.

Training completed. Do not forget to share your model on huggingface.co/models =)

10/15/2021 20:14:10 - INFO - trainer_prefix -   Saving model checkpoint to save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
10/15/2021 20:14:11 - INFO - __main__ -   *** Evaluate ***
10/15/2021 20:14:11 - INFO - trainer_prefix -   ***** Running Evaluation *****
10/15/2021 20:14:11 - INFO - trainer_prefix -     Num examples = 42061
10/15/2021 20:14:11 - INFO - trainer_prefix -     Batch size = 10
False
False
{'eval_loss': 25.165123616772462, 'epoch': 5.0, 'total_flos': 2514722051589120, 'step': 21035}
10/15/2021 20:18:41 - INFO - __main__ -   ***** Eval results *****
10/15/2021 20:18:41 - INFO - __main__ -     perplexity = 25.165123616772462
running evaluation on  /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
suggested code:
python gen.py data2text yes valid /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python gen.py data2text yes test /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python run_generation.py         --model_type=gpt2         --length 100         --model_name_or_path=gpt2-medium         --num_return_sequences 5         --stop_token [EOS]         --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --task_mode=data2text         --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset valid     --optim_prefix no --preseqlen 20 --prefix_mode activation  --format_mode cat  --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:18:42 - WARNING - __main__ -   device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "_objective_mode": 2,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "vocab_size": 50257
}

GPT2Config {
  "_my_arg_control": true,
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "activation_function": "gelu_new",
  "architectures": [
    "PrefixTuning"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "format_mode": "cat",
  "init_random": "no",
  "init_shallow": "no",
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "lowdata": false,
  "mid_dim": 512,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "optim_prefix": true,
  "predict_special_tokens": true,
  "prefix_dropout": 0.0,
  "preseqlen": 5,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "train_weights": "no",
  "use_infix": false,
  "vocab_size": 50258
}

under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:00 - INFO - __main__ -   Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='valid', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path  /data/qbao775/PrefixTuning/data/e2e_data/src1_valid.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold
547
Traceback (most recent call last):
  File "run_generation.py", line 1356, in <module>
    main()
  File "run_generation.py", line 825, in main
    write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
  File "run_generation.py", line 360, in write_e2e_corr
    with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold'
python run_generation.py         --model_type=gpt2         --length 100         --model_name_or_path=gpt2-medium         --num_return_sequences 5         --stop_token [EOS]         --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --task_mode=data2text         --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset test     --optim_prefix no --preseqlen 20 --prefix_mode activation  --format_mode cat  --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:19:02 - WARNING - __main__ -   device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "_objective_mode": 2,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "vocab_size": 50257
}

GPT2Config {
  "_my_arg_control": true,
  "_my_arg_task_mode": "data2text",
  "_my_arg_tune_mode": "prefixtune",
  "activation_function": "gelu_new",
  "architectures": [
    "PrefixTuning"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "format_mode": "cat",
  "init_random": "no",
  "init_shallow": "no",
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "lowdata": false,
  "mid_dim": 512,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "optim_prefix": true,
  "predict_special_tokens": true,
  "prefix_dropout": 0.0,
  "preseqlen": 5,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "train_weights": "no",
  "use_infix": false,
  "vocab_size": 50258
}

under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:20 - INFO - __main__ -   Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='test', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path  /data/qbao775/PrefixTuning/data/e2e_data/src1_test.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
630
Traceback (most recent call last):
  File "run_generation.py", line 1356, in <module>
    main()
  File "run_generation.py", line 825, in main
    write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
  File "run_generation.py", line 360, in write_e2e_corr
    with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold'

Here are my environment configuration:

Package                Version     Location
---------------------- ----------- -------------------------------------------
absl-py                0.14.1
cachetools             4.2.4
certifi                2021.10.8
charset-normalizer     2.0.7
click                  8.0.3
filelock               3.3.0
future                 0.18.2
google-auth            1.35.0
google-auth-oauthlib   0.4.6
grpcio                 1.41.0
idna                   3.3
joblib                 1.1.0
Markdown               3.3.4
nltk                   3.6.5
numpy                  1.21.2
oauthlib               3.1.1
packaging              21.0
Pillow                 8.3.2
pip                    20.0.2
pkg-resources          0.0.0
protobuf               3.18.1
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pyparsing              2.4.7
pytorch-lightning      0.9.0
PyYAML                 6.0
regex                  2021.10.8
requests               2.26.0
requests-oauthlib      1.3.0
rsa                    4.7.2
sacremoses             0.0.46
sentencepiece          0.1.96
setuptools             44.0.0
six                    1.16.0
tensorboard            2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers             0.8.1rc2
torch                  1.8.0+cu111
torchvision            0.9.0+cu111
tqdm                   4.62.3
transformers           3.2.0       /data/qbao775/PrefixTuning/transformers/src
typing-extensions      3.10.0.2
urllib3                1.26.7
Werkzeug               2.0.2
wheel                  0.37.0
14H034160212 commented 2 years ago

Using that with open(corr_path, 'w+',encoding="utf-8") as f: in the function write_e2e_corr, and create a directory args.gen_dir