QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
3k stars 175 forks source link

qwen2-vl inference: cannot identify image file #228

Open illyafan opened 1 month ago

illyafan commented 1 month ago

`# Preparation for inference messages = [ { "role": "user", "content": [ { "type": "image", "image": image_url, }, { "type": "text", "text": prompt }, ], } ] text = self.processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) try: image_inputs, video_inputs = process_vision_info(messages) inputs = self.processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda")

        # Inference: Generation of the output
        generated_ids = self.model.generate(**inputs, max_new_tokens=64)
        generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
        ]
        output_text = self.processor.batch_decode(
            generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
        )`
image

it seems process_vision_info func include the request image and PIL open, but some images are not identified, how to solve the issue?

kq-chen commented 1 month ago

Could you please share the image or URL that can reproduce the issue?