QingruZhang / AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
MIT License
275 stars 28 forks source link

TypeError: 'NoneType' object is not iterable #13

Open lcqlalala opened 1 year ago

lcqlalala commented 1 year ago

Hi, thanks for this awesome work!

When I ran this run_debertav3_qqp.sh script, I encountered the following error. However,

Traceback (most recent call last):
File "examples/text-classification/run_glue.py", line 774, in
main()
File "examples/text-classification/run_glue.py", line 694, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 885, in train
self._load_state_dict_in_model(state_dict)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model
if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):
TypeError: 'NoneType' object is not iterable

QingruZhang commented 1 year ago

Hi, I think this issue is caused by the incorrect dependency installation. Can you follow the instruction here to install the customized transformers? It may address your issue.

licy02 commented 1 year ago

Hi, I had the same problem and followed the steps exactly.

QingruZhang commented 1 year ago

Hi, can you install the transformer package in NLU_QA folder again to see if the issue can be addressed?

licy02 commented 1 year ago

I tried this method, and it doesn't seem to work. I encountered the same issue when replicating LoRA, but if I remove "resume_from_checkpoint=checkpoint," it seems to work, but I'm not sure if this will affect the results.

QingruZhang commented 1 year ago

Hi, can you confirm that the correct run_glue.py script is running? If the Traceback message is same as the first comment, I think the wrong script is running. The line 694 of provided run_glue.py should not be train_result = trainer.train(resume_from_checkpoint=checkpoint).

licy02 commented 1 year ago

When I follow the instruction, I encounter this error: Traceback (most recent call last): File "examples/text-classification/run_glue.py", line 754, in main() File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 885, in train self._load_state_dict_in_model(state_dict) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):

when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)

File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup

lcqlalala commented 1 year ago

I reinstalled it according to the instructions of NLU/README and still have this problem. But when I remove resume_from_checkpoint=checkpoint in run_glue.py line 674 [train_result = trainer.train(resume_from_checkpoint=checkpoint)], this error does not appear.

lcqlalala commented 1 year ago

Only running run_debertav3_stsb.sh in multiple scripts didn't have this problem, it's not clear where the issue occurs

lcqlalala commented 1 year ago

When I follow the instruction, I encounter this error: Traceback (most recent call last): File "examples/text-classification/run_glue.py", line 754, in main() File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 885, in train self._load_state_dict_in_model(state_dict) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):

when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)

File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup

You should modify the hyperparameters to ensure self.total_step>self.initial_warmup+self.final_warmup. Specifically, self.total_step=the number of train_dataset/per_device_train_batch_size *num_train_epochs

licy02 commented 1 year ago

Is the final output you get the entire model? I think getting the Lora weights would be more reasonable.