Open Nero10578 opened 5 days ago
ValueError: While loading /home/owen/loras/Qwen2.5-Coder-7B-Instruct-lora, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'] but received ['lm_head', 'lm_head', 'model.embed_tokens', 'model.embed_tokens']. Please verify that the loaded LoRA module is correct
It doesn't like the input and output embeddings in the LoRA adapter.
They are valid to have in a LoRA, but it is a bit weird it lists them both twice?!
Can you try commenting out these two module_details.append
lines and replacing with a pass
like so:
if module == pretrained_model.get_input_embeddings():
# if isinstance(module, torch.nn.Embedding):
pass #module_details.append(("embedding", name, module.weight.size()))
elif module == pretrained_model.get_output_embeddings():
# if isinstance(module, torch.nn.Embedding):
pass #module_details.append(("output", name, module.weight.size()))
and see if the LoRA it creates works OK?
Also can you tell me what the peak VRAM use is with these commented out to try to help with your other problem of high VRAM use? If it is just these causing a problem then I can easily add a command line option to skip the input/output embeddings, but if it still uses a lot of VRAM it must be something in the SVD function that upcasts some stuff to float32
.
The "doubling listing" in the exception, makes me think it could also be something to do with having tied input/output tensors, but I think only the very tiny qwen
models use this.
You can tell if you look in the config.json
file:
"tie_word_embeddings": false
or in the model.safetensors.index.json
file to see if both these are listed:
"lm_head.weight": "model-00037-of-00037.safetensors"
"model.embed_tokens.weight": "model-00001-of-00037.safetensors",
ValueError: While loading /home/owen/loras/Qwen2.5-Coder-7B-Instruct-lora, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'] but received ['lm_head', 'lm_head', 'model.embed_tokens', 'model.embed_tokens']. Please verify that the loaded LoRA module is correct
It doesn't like the input and output embeddings in the LoRA adapter.
They are valid to have in a LoRA, but it is a bit weird it lists them both twice?!
Can you try commenting out these two
module_details.append
lines and replacing with apass
like so:if module == pretrained_model.get_input_embeddings(): # if isinstance(module, torch.nn.Embedding): pass #module_details.append(("embedding", name, module.weight.size())) elif module == pretrained_model.get_output_embeddings(): # if isinstance(module, torch.nn.Embedding): pass #module_details.append(("output", name, module.weight.size()))
and see if the LoRA it creates works OK?
Also can you tell me what the peak VRAM use is with these commented out to try to help with your other problem of high VRAM use? If it is just these causing a problem then I can easily add a command line option to skip the input/output embeddings, but if it still uses a lot of VRAM it must be something in the SVD function that upcasts some stuff to
float32
.The "doubling listing" in the exception, makes me think it could also be something to do with having tied input/output tensors, but I think only the very tiny
qwen
models use this.You can tell if you look in the
config.json
file:"tie_word_embeddings": false
or in the
model.safetensors.index.json
file to see if both these are listed:"lm_head.weight": "model-00037-of-00037.safetensors"
"model.embed_tokens.weight": "model-00001-of-00037.safetensors",
Will try this and get back to you. Thanks!
Usually you can use LoRA extraction in mergekit and then run the LoRAs in vLLM or Aphrodite Engine just fine. This works for Llama and Mistral models so far, but it seems like this isn't working for Qwen2.5 models?
If I use my LoRA created from LoRA training using Axolotl, vLLM and Aphrodite Engine runs Qwen LoRAs just fine.
The extraction seems to work without issues too, just cannot be used.
Error traceback from Aphrodite Engine trying to run Qwen2.5-7B lora:
Full traceback: