google-research / scenic

Scenic: A Jax Library for Computer Vision Research and Beyond
Apache License 2.0
3.34k stars 441 forks source link

[OWLv2] One-shot results (using an image as a query) do NOT work as they do in the Google Colab notebook #1096

Open sheikh1000 opened 3 months ago

sheikh1000 commented 3 months ago

Hi, You've done some great work!

Although I've run the Colab notebook here and the results of Image-conditioned detection are great, I don't get the same results when I try it in my local environment.

I'm using the Hugging Face implementation.

Results with text prompts:

results_of_text_prompt

Image used for prompting:

image_prompt

Results of prompting with the above image:

results_of_image_prompt

The Colab notebook takes an image prompt in the form of a UI, where the user makes a bounding box on the query image to select the object of interest. Hence, I provided the cropped image of the object of interest (a resistor, in this case).

Below is the Hugging Face implementation that I'm using (the query image is the cropped image of the resistor in this case):

inputs = processor(images=image, query_images=query_image, return_tensors="pt") with torch.no_grad(): outputs = model.image_guided_detection(**inputs)