HaoUNSW / PISA

126 stars 16 forks source link

Request for Assistance with BERT2BERT Model Error in HuggingFace_EncDec Experiment #10

Open cccccrj opened 2 weeks ago

cccccrj commented 2 weeks ago

Hello, I am very interested in your research and am currently trying to run some experiments based on it. However, I encountered an issue while running the program from the HuggingFace_EncDec directory after loading the BERT model, and I would greatly appreciate your assistance. Below is the script I defined according to your specifications:

!/usr/bin/env bash

model_path=/root/autodl-tmp/models/models--google-bert--bert-base-uncased/snapshots/86b5e0934494bd15c9632b12f734a8a67f723594 save_model_path=/root/PISA-main/PISA-main/SG_Pretrained_BERT/ predict_results_path=G_Pretrained_BERT_pred/

python3 run_hf_enc_dec_train.py \ --model_name_or_path $model_path\ --do_train \ --seed=88 \ --save_total_limit=1 \ --train_file /root/PISA-main/PISA-main/Dataset/PISA-prompt/SG/train.json \ --validation_file /root/PISA-main/PISA-main/Dataset/PISA-prompt/SG/val.json \ --output_dir $save_model_path \ --rouge_path dummy_path \ --per_device_train_batch_size=4 \ --overwrite_output_dir \ --predict_with_generate \ --num_train_epochs 30 \ --max_source_length 1024 \ --max_target_length 128 \ --learning_rate 3e-5

Unfortunately, running the model gives me the following error: [INFO|trainer.py:1279] 2024-11-03 23:25:02,211 >> Running training [INFO|trainer.py:1280] 2024-11-03 23:25:02,211 >> Num examples = 96552 [INFO|trainer.py:1281] 2024-11-03 23:25:02,211 >> Num Epochs = 30 [INFO|trainer.py:1282] 2024-11-03 23:25:02,211 >> Instantaneous batch size per device = 4 [INFO|trainer.py:1283] 2024-11-03 23:25:02,211 >> Total train batch size (w. parallel, distributed & accumulation) = 4 [INFO|trainer.py:1284] 2024-11-03 23:25:02,211 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1285] 2024-11-03 23:25:02,211 >> Total optimization steps = 724140 0%| | 0/724140 [00:00<?, ?it/s]Traceback (most recent call last): File "run_hf_enc_dec_train.py", line 632, in main() File "run_hf_enc_dec_train.py", line 551, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1400, in train tr_loss_step = self.training_step(model, inputs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1984, in training_step loss = self.compute_loss(model, inputs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 2016, in compute_loss outputs = model(inputs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 489, in forward encoder_outputs = self.encoder( File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 996, in forward encoder_outputs = self.encoder( File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 585, in forward layer_outputs = layer_module( File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 472, in forward self_attention_outputs = self.attention( File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 402, in forward self_outputs = self.self( File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 268, in forward mixed_query_layer = self.query(hidden_states) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) File "/root/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle) 0%| | 0/724140 [00:00<?, ?it/s

asdm123asd commented 1 week ago

How should I run the code for this paper?