QingruZhang / AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
MIT License
259 stars 28 forks source link

TypeError: 'NoneType' object is not iterable #13

Open lcqlalala opened 11 months ago

lcqlalala commented 11 months ago

Hi, thanks for this awesome work!

When I ran this run_debertav3_qqp.sh script, I encountered the following error. However,

Traceback (most recent call last):
File "examples/text-classification/run_glue.py", line 774, in
main()
File "examples/text-classification/run_glue.py", line 694, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 885, in train
self._load_state_dict_in_model(state_dict)
File "/mnt/ssd/temp_123/AdaLoRA-main/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model
if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):
TypeError: 'NoneType' object is not iterable

QingruZhang commented 11 months ago

Hi, I think this issue is caused by the incorrect dependency installation. Can you follow the instruction here to install the customized transformers? It may address your issue.

licy02 commented 11 months ago

Hi, I had the same problem and followed the steps exactly.

QingruZhang commented 11 months ago

Hi, can you install the transformer package in NLU_QA folder again to see if the issue can be addressed?

licy02 commented 11 months ago

I tried this method, and it doesn't seem to work. I encountered the same issue when replicating LoRA, but if I remove "resume_from_checkpoint=checkpoint," it seems to work, but I'm not sure if this will affect the results.

QingruZhang commented 11 months ago

Hi, can you confirm that the correct run_glue.py script is running? If the Traceback message is same as the first comment, I think the wrong script is running. The line 694 of provided run_glue.py should not be train_result = trainer.train(resume_from_checkpoint=checkpoint).

licy02 commented 11 months ago

When I follow the instruction, I encounter this error: Traceback (most recent call last): File "examples/text-classification/run_glue.py", line 754, in main() File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 885, in train self._load_state_dict_in_model(state_dict) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):

when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)

File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup

lcqlalala commented 11 months ago

I reinstalled it according to the instructions of NLU/README and still have this problem. But when I remove resume_from_checkpoint=checkpoint in run_glue.py line 674 [train_result = trainer.train(resume_from_checkpoint=checkpoint)], this error does not appear.

lcqlalala commented 11 months ago

Only running run_debertav3_stsb.sh in multiple scripts didn't have this problem, it's not clear where the issue occurs

lcqlalala commented 11 months ago

When I follow the instruction, I encounter this error: Traceback (most recent call last): File "examples/text-classification/run_glue.py", line 754, in main() File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 885, in train self._load_state_dict_in_model(state_dict) File "/home/math/acam/lora/AdaLoRA/NLU/src/transformers/trainer.py", line 2050, in _load_state_dict_in_model if set(load_result.missing_keys) == set(self.model._keys_to_ignore_on_save):

when I follow your solution, I encounter this problem. File "examples/text-classification/run_glue.py", line 674, in main File "examples/text-classification/run_glue.py", line 674, in main train_result = trainer.train(resume_from_checkpoint=checkpoint)train_result = trainer.train(resume_from_checkpoint=checkpoint)

File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1423, in train ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop ignore_keys_for_eval=ignore_keys_for_eval, File "/home/math/acam/lora/AdaLoRA/NLG_QA/src/transformers/trainer.py", line 1509, in _inner_training_loop self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step self.rankallocator.set_total_step(max_steps) File "/home/math/acam/lora/AdaLoRA/loralib/loralib/adalora.py", line 162, in set_total_step assert self.total_step>self.initial_warmup+self.final_warmup AssertionErrorassert self.total_step>self.initial_warmup+self.final_warmup

You should modify the hyperparameters to ensure self.total_step>self.initial_warmup+self.final_warmup. Specifically, self.total_step=the number of train_dataset/per_device_train_batch_size *num_train_epochs

licy02 commented 11 months ago

Is the final output you get the entire model? I think getting the Lora weights would be more reasonable.