Open lcqlalala opened 1 year ago
Hi, I think this issue is caused by the incorrect dependency installation. Can you follow the instruction here to install the customized transformers
? It may address your issue.
Hi, I had the same problem and followed the steps exactly.
Hi, can you install the transformer package in NLU_QA
folder again to see if the issue can be addressed?
I tried this method, and it doesn't seem to work. I encountered the same issue when replicating LoRA, but if I remove "resume_from_checkpoint=checkpoint," it seems to work, but I'm not sure if this will affect the results.
Hi, can you confirm that the correct run_glue.py
script is running? If the Traceback message is same as the first comment, I think the wrong script is running. The line 694 of provided run_glue.py
should not be train_result = trainer.train(resume_from_checkpoint=checkpoint)
.
When I follow the instruction, I encounter this error:
Traceback (most recent call last):
File "examples/text-classification/run_glue.py", line 754, in
when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup
I reinstalled it according to the instructions of NLU/README and still have this problem. But when I remove resume_from_checkpoint=checkpoint in run_glue.py line 674 [train_result = trainer.train(resume_from_checkpoint=checkpoint)], this error does not appear.
Only running run_debertav3_stsb.sh in multiple scripts didn't have this problem, it's not clear where the issue occurs
When I follow the instruction, I encounter this error: Traceback (most recent call last): File "examples/text-classification/run_glue.py", line 754, in main() File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 885, in train self._load_state_dict_in_model(state_dict) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):
when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup
You should modify the hyperparameters to ensure self.total_step>self.initial_warmup+self.final_warmup. Specifically, self.total_step=the number of train_dataset/per_device_train_batch_size *num_train_epochs
Is the final output you get the entire model? I think getting the Lora weights would be more reasonable.
Hi, thanks for this awesome work!
When I ran this run_debertav3_qqp.sh script, I encountered the following error. However,
Traceback (most recent call last):
File "examples/text-classification/run_glue.py", line 774, in
main()
File "examples/text-classification/run_glue.py", line 694, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 885, in train
self._load_state_dict_in_model(state_dict)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model
if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):
TypeError: 'NoneType' object is not iterable