Open illyafan opened 1 month ago
`# Preparation for inference messages = [ { "role": "user", "content": [ { "type": "image", "image": image_url, }, { "type": "text", "text": prompt }, ], } ] text = self.processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) try: image_inputs, video_inputs = process_vision_info(messages) inputs = self.processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda")
# Inference: Generation of the output generated_ids = self.model.generate(**inputs, max_new_tokens=64) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = self.processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False )`
it seems process_vision_info func include the request image and PIL open, but some images are not identified, how to solve the issue?
Could you please share the image or URL that can reproduce the issue?
`# Preparation for inference messages = [ { "role": "user", "content": [ { "type": "image", "image": image_url, }, { "type": "text", "text": prompt }, ], } ] text = self.processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) try: image_inputs, video_inputs = process_vision_info(messages) inputs = self.processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda")
it seems process_vision_info func include the request image and PIL open, but some images are not identified, how to solve the issue?