Although I've run the Colab notebook here and the results of Image-conditioned detection are great, I don't get the same results when I try it in my local environment.
The Colab notebook takes an image prompt in the form of a UI, where the user makes a bounding box on the query image to select the object of interest. Hence, I provided the cropped image of the object of interest (a resistor, in this case).
Below is the Hugging Face implementation that I'm using (the query image is the cropped image of the resistor in this case):
inputs = processor(images=image, query_images=query_image, return_tensors="pt") with torch.no_grad(): outputs = model.image_guided_detection(**inputs)
Hi, You've done some great work!
Although I've run the Colab notebook here and the results of Image-conditioned detection are great, I don't get the same results when I try it in my local environment.
I'm using the Hugging Face implementation.
Results with text prompts:
Image used for prompting:
Results of prompting with the above image:
The Colab notebook takes an image prompt in the form of a UI, where the user makes a bounding box on the query image to select the object of interest. Hence, I provided the cropped image of the object of interest (a resistor, in this case).
Below is the Hugging Face implementation that I'm using (the query image is the cropped image of the resistor in this case):
inputs = processor(images=image, query_images=query_image, return_tensors="pt") with torch.no_grad(): outputs = model.image_guided_detection(**inputs)