tanshuai0219 commented 2 months ago

When I run the code: ` TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"} tokenizer = LlamaTokenizer.from_pretrained(llama2_dir) text_encoder = LlamaForCausalLM.from_pretrained(llama2_dir, torch_dtype=torch.float16).to(device) tokenizer.pad_token = '[PAD]' text_encoder.eval()

tokenizer.model_max_length = 256

text_encoder_loraparams, = inject_trainable_lora_extended( text_encoder, r=32, target_replace_module=TEXT_ENCODER_REPLACE_MODULES,

loras=None, # path to lora .pt

) `

then, I print text_encoder_lora_params, and get "[]" a null dict.

ShihaoZhaoZSH commented 2 months ago

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

tanshuai0219 commented 2 months ago

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

I align my environment to your provided environment.yaml but still get the same issue. Then, I change TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"} to TEXT_ENCODER_REPLACE_MODULES = {"LlamaSdpaAttention","LlamaDecoderLayer"} I get valid text_encoder_lora_params and check the params via: ` text_encoder_lora_params2 = itertools.chain(*text_encoder_lora_params)

total_parameters2 = sum(p.numel() for p in text_encoder_lora_params2) print("Total parameters:", total_parameters2) # 79953920 ` and it print "79953920"

is it ok with your other train code?

ShihaoZhaoZSH commented 2 months ago

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok, as inserting LoRA is to find the appropriate layer where it can be inserted within the provided block.

tanshuai0219 commented 2 months ago

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

tanshuai0219 commented 2 months ago

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

Besides, transformers 4.34 cannot work with huggingface_hub.utils, whcih is necessary to run training code...

ShihaoZhaoZSH commented 2 months ago

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

tanshuai0219 commented 2 months ago

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

I have another try. Replacing "LlamaAttention" with "LlamaSdpaAttention" in transformers(4.38.2) works and I check the trainable parameters, it equals to "LlamaAttention" in transformers(4.34)~

ShihaoZhaoZSH commented 2 months ago

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

tanshuai0219 commented 2 months ago

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

Thanks for your reply and awesome work. I got lots of benefits from it~

ShihaoZhaoZSH / LaVi-Bridge

lora weight #10

tokenizer.model_max_length = 256

loras=None, # path to lora .pt