Open WeiminLee opened 3 weeks ago
I encountered a similar issue with incomplete output. Notably, shorter queries tend to yield better responses, while queries in the same format as the training data often result in poor responses. I'm unsure if this is due to using long prompts to fine-tune the model. Initially, I thought my prompt was too long, but based on your response, it seems that might not be the issue.
I think you need to add <ImageHere>
into your annotations
I think you need to add
<ImageHere>
into your annotations
but I already added
{
"id": "5596",
"image": [
"/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png"
],
"conversations": [
{
"from": "user",
"value": "
Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a
model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )
model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )
adapter_name_or_path = '/data/lwm-data/interml-finetune'
model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto')
model.eval()
I think you need to add
<ImageHere>
into your annotationsbut I already added in query, here is the example:
{ "id": "5596", "image": [ "/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png" ], "conversations": [ { "from": "user", "value": " Would you kindly provide the bounding boxes of "Google map" located in the picture?" }, { "from": "assistant", "value": "The bounding box is [620, 800, 771, 984]" } ] }
Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a
look at this.
model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )
The right code should be like:
model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )
adapter_name_or_path = '/data/lwm-data/interml-finetune' model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto') model.eval()
Thank you for your method, it indeed solved the problem. Would you mind explaining how you discovered this solution?
I think you need to add
<ImageHere>
into your annotationsbut I already added in query, here is the example: { "id": "5596", "image": [ "/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png" ], "conversations": [ { "from": "user", "value": " Would you kindly provide the bounding boxes of "Google map" located in the picture?" }, { "from": "assistant", "value": "The bounding box is [620, 800, 771, 984]" } ] } Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a
look at this.
model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )
The right code should be like:
model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )
adapter_name_or_path = '/data/lwm-data/interml-finetune' model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto') model.eval()
Thank you for your method, it indeed solved the problem. Would you mind explaining how you discovered this solution?
I discussed this issue with my coworker and attempted a different model-loading method, which appears to have resolved the problem. However, the underlying cause remains unclear to me.
Still delving deeper into it.
thx, i have fixed the question following your steps. hope to help others. the xcomposer2 4khd lora infer code can be showed as followings:
model_path = '/xxxx/path2model/internlm-xcomposer2-4khd-7b'
adapter_name_or_path = "/xxxx/path2_trained_lora_weight/20240605_mix_4.3k_xcom_4khd_loara_ft"
model = AutoModelForCausalLM.from_pretrained(model_path,
# device_map='auto',
device_map='cuda:0',
trust_remote_code=True,
resume_download=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model,
adapter_name_or_path,
device_map='cuda:0')
model.half().cuda()
query = '<ImageHere>这张图都主要讲了什么?'
image = '/xxx/examples/4khd_example.webp'
with torch.cuda.amp.autocast():
response, _ = model.chat(tokenizer, query=query, image=image, history=[], hd_num=16, do_sample=True)
print("------>response: ", response)
results as followings:
by the way, is there any method you know, we can merge lora weights and xcomposer weights together in a smooth way ?@WeiminLee
Train过程
{ "id": "46173", "image": [ "/data/lwm-data/resized_without_padding_images/973036f0db9ace60a844d27897652658.png" ], "conversations": [ { "from": "user", "value": " In the photograph, could you pinpoint the location of \"07 4035 5459\" and tell me its bounding boxes?"
},
{
"from": "assistant",
"value": "The bounding box is [559, 44, 674, 80]"
}
]
},
infer过程
with torch.cuda.amp.autocast(): with torch.nograd(): response, = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
输出
{ "label": "The bounding box is [354, 269, 402, 287]", "response": "The bounding box is [375, 373, 434, 390]" } { "label": "The bounding box is [666, 607, 812, 628]", "response": "The bounding box is [675, 619, 716\n" } { "label": "The bounding box is [297, 171, 366, 190]", "response": "The bounding box is [300\n" } { "label": "The bounding box is [475, 12, 578, 28]", "response": "The bounding box is [525\n" } { "query": " In, can you guide me to the location of \"THE BIG WAVES JOURNAL\" by providing bounding boxes?",
"label": "The bounding box is [522, 0, 628, 82]",
"response": "The bounding box is [566\n"
} {
"query": " Help me to locate \"Vinyl Fencing\" in and give me its bounding boxes, please.",
"label": "The bounding box is [329, 803, 375, 819]",
"response": "The bounding box is [350, 822,\n"
}
大多数情况无法完成的输出句子,生成一个\n后就终止,不知道这里面哪里出了问题。。。