InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.91k stars 120 forks source link

LoRA微调后推理阶段,一直输出到\n后就早停 #328

Open WeiminLee opened 3 weeks ago

WeiminLee commented 3 weeks ago

Train过程

{ "id": "46173", "image": [ "/data/lwm-data/resized_without_padding_images/973036f0db9ace60a844d27897652658.png" ], "conversations": [ { "from": "user", "value": " In the photograph, could you pinpoint the location of \"07 4035 5459\" and tell me its bounding boxes?" }, { "from": "assistant", "value": "The bounding box is [559, 44, 674, 80]" } ] },

infer过程

with torch.cuda.amp.autocast(): with torch.nograd(): response, = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)

输出

{ "label": "The bounding box is [354, 269, 402, 287]", "response": "The bounding box is [375, 373, 434, 390]" } { "label": "The bounding box is [666, 607, 812, 628]", "response": "The bounding box is [675, 619, 716\n" } { "label": "The bounding box is [297, 171, 366, 190]", "response": "The bounding box is [300\n" } { "label": "The bounding box is [475, 12, 578, 28]", "response": "The bounding box is [525\n" } { "query": " In, can you guide me to the location of \"THE BIG WAVES JOURNAL\" by providing bounding boxes?", "label": "The bounding box is [522, 0, 628, 82]", "response": "The bounding box is [566\n" } { "query": " Help me to locate \"Vinyl Fencing\" in and give me its bounding boxes, please.", "label": "The bounding box is [329, 803, 375, 819]", "response": "The bounding box is [350, 822,\n" }

大多数情况无法完成的输出句子,生成一个\n后就终止,不知道这里面哪里出了问题。。。

nzomi commented 3 weeks ago

I encountered a similar issue with incomplete output. Notably, shorter queries tend to yield better responses, while queries in the same format as the training data often result in poor responses. I'm unsure if this is due to using long prompts to fine-tune the model. Initially, I thought my prompt was too long, but based on your response, it seems that might not be the issue.

thonglv21 commented 3 weeks ago

I think you need to add <ImageHere> into your annotations

WeiminLee commented 3 weeks ago

I think you need to add <ImageHere> into your annotations

but I already added in query, here is the example:

{ "id": "5596", "image": [ "/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png" ], "conversations": [ { "from": "user", "value": " Would you kindly provide the bounding boxes of \"Google map\" located in the picture?" }, { "from": "assistant", "value": "The bounding box is [620, 800, 771, 984]" } ] }

Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a

look at this.

model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )

The right code should be like:

model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )

adapter_name_or_path = '/data/lwm-data/interml-finetune'
model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto')
model.eval()
nzomi commented 3 weeks ago

I think you need to add <ImageHere> into your annotations

but I already added in query, here is the example:

{ "id": "5596", "image": [ "/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png" ], "conversations": [ { "from": "user", "value": " Would you kindly provide the bounding boxes of "Google map" located in the picture?" }, { "from": "assistant", "value": "The bounding box is [620, 800, 771, 984]" } ] }

Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a

look at this.

model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )

The right code should be like:

model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )

adapter_name_or_path = '/data/lwm-data/interml-finetune'
model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto')
model.eval()

Thank you for your method, it indeed solved the problem. Would you mind explaining how you discovered this solution?

WeiminLee commented 3 weeks ago

I think you need to add <ImageHere> into your annotations

but I already added in query, here is the example: { "id": "5596", "image": [ "/data/lwm-data/resized_without_padding_images/c0241758ce85f63082b3a8aa8c877a4b.png" ], "conversations": [ { "from": "user", "value": " Would you kindly provide the bounding boxes of "Google map" located in the picture?" }, { "from": "assistant", "value": "The bounding box is [620, 800, 771, 984]" } ] } Actually, the reason for these unusual outputs is that I used the wrong code to load the checkpoint after fine-tuning. Take a

look at this.

model = AutoPeftModelForCausalLM.from_pretrained( checkpoint_path, device_map='auto', trust_remote_code=True, resume_download=True, )

The right code should be like:

model_path = '/data/model_hub/internlm-xcomposer2-4khd-7b' model = AutoModelForCausalLM.from_pretrained( model_path, device_map='auto', trust_remote_code=True, resume_download=True, )

adapter_name_or_path = '/data/lwm-data/interml-finetune'
model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map='auto')
model.eval()

Thank you for your method, it indeed solved the problem. Would you mind explaining how you discovered this solution?

I discussed this issue with my coworker and attempted a different model-loading method, which appears to have resolved the problem. However, the underlying cause remains unclear to me.

Still delving deeper into it.

ztfmars commented 3 weeks ago

thx, i have fixed the question following your steps. hope to help others. the xcomposer2 4khd lora infer code can be showed as followings:

model_path = '/xxxx/path2model/internlm-xcomposer2-4khd-7b' 
adapter_name_or_path = "/xxxx/path2_trained_lora_weight/20240605_mix_4.3k_xcom_4khd_loara_ft"

model = AutoModelForCausalLM.from_pretrained(model_path, 
                                            #  device_map='auto',
                                            device_map='cuda:0', 
                                            trust_remote_code=True, 
                                            resume_download=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model = PeftModel.from_pretrained(model, 
                                  adapter_name_or_path,
                                  device_map='cuda:0')
model.half().cuda()

query = '<ImageHere>这张图都主要讲了什么?'
image = '/xxx/examples/4khd_example.webp'

with torch.cuda.amp.autocast():
    response, _ = model.chat(tokenizer, query=query, image=image, history=[],  hd_num=16, do_sample=True)

print("------>response: ", response)

results as followings: image

by the way, is there any method you know, we can merge lora weights and xcomposer weights together in a smooth way ?@WeiminLee