allenai / sherlock

Code, data, models for the Sherlock corpus
Apache License 2.0
55 stars 7 forks source link

Difficulty displaying bounding boxes #1

Closed adrielkuek closed 2 years ago

adrielkuek commented 2 years ago

Hi, I'm having a little difficulty in the demo notebook in trying to obtain and display the corresponding bounding box notations. When I try to display _image_with_highlight_ where: image_with_highlight = ImageHighlightBboxDataset.highlight_region( image, [{'left': targets[0, 0], 'top': targets[0, 1], 'width': targets[0, 2], 'height': targets[0, 3]}])

The image does not contain the bbox highlights. Is there a way to obtain the top bboxes from the image currently?

Thanks!

jmhessel commented 2 years ago

Hi @adrielkuek ! Thanks for your interest in our work!

That part of the demo is meant to take as input from the user via jupyter_innotater. When you run the demo, are you able to see the bounding box user input interface spawned by:

targets = np.zeros((1, 4)) # Initialise bounding boxes as x,y = 0,0, width,height = 0,0
Innotater( ImageInnotation([image_path]), BoundingBoxInnotation(targets) )

?

Also, can you print what your targets array contains? It defaults to all zeros and is meant to be over-written by jupyter_innotater

https://github.com/ideonate/jupyter-innotater

adrielkuek commented 2 years ago

Thanks @jmhessel for the response! I seem to be having difficulty in getting juypter_innotater widget to work. I've done the pip install correctly and ran the relevant code. The output of targets after running the line: Innotater( ImageInnotation([image_path]), BoundingBoxInnotation(targets) ) is:

targets = [[0. 0. 0. 0.]]

The innotater does not seem to be working properly. Am I missing something that is required to run the innotater? I've tried running the juypter notebook both on VS code and on juypter-notebook web, and both experienced the same thing. Somehow the widget just doesn't seem to be displaying properly.

Alternatively, is there a way to extract out the images with bbox somewhere? I can use matplotlib.pyplot or pillow to display as well. Shouldn't be an issue.

Thanks!

jmhessel commented 2 years ago

Interesting!

A quick answer is you can just modify targets with x,y,width,height as you desire, but that's cumbersome (which is why I added the interface). Could you possibly send me a screenshot of the annotator not working, e.g., with an error message? Hard to fix because it's working as-is on my end (though, if I recall correctly, I did have to do something extra when I was installing jupyter lab to enable extensions)...

If you're operating at a very small scale, we are hosting the versions of the files with the bounding boxes pre-rendered given the instance id, but, as the readme specifies, if you're planning to use the whole image corpus, it's much better to download from the original sources.

https://storage.googleapis.com/ai2-jack-public/sherlock_mturk/images_with_bboxes/55cd11d9d977d83744ecbdbaabb770ee.jpg

Jack

jmhessel commented 2 years ago

When you run jupyterlab, is there an error message related to nodejs that appears in the console?

adrielkuek commented 2 years ago

When you run jupyterlab, is there an error message related to nodejs that appears in the console?

No, there is no error message relating to nodejs on console when running in juypter notebook. The widget simply just doesn't appear. I suspect there needs to be some other extensions that is required for it to work. I've tested with a separate workspace using just purely juypter_innotater with the sample code given in repo:

from jupyter_innotater import *
import numpy as np, os

images = os.listdir('./foods/')
targets = np.zeros((len(images), 4)) # Initialise bounding boxes as x,y = 0,0, width,height = 0,0

Innotater( ImageInnotation(images, path='./foods'), BoundingBoxInnotation(targets) )

This doesn't fire up the widget as well. At this juncture, I'm not sure what other dependencies I may be missing out.

adrielkuek commented 2 years ago

Interesting!

A quick answer is you can just modify targets with x,y,width,height as you desire, but that's cumbersome (which is why I added the interface). Could you possibly send me a screenshot of the annotator not working, e.g., with an error message? Hard to fix because it's working as-is on my end (though, if I recall correctly, I did have to do something extra when I was installing jupyter lab to enable extensions)...

If you're operating at a very small scale, we are hosting the versions of the files with the bounding boxes pre-rendered given the instance id, but, as the readme specifies, if you're planning to use the whole image corpus, it's much better to download from the original sources.

https://storage.googleapis.com/ai2-jack-public/sherlock_mturk/images_with_bboxes/55cd11d9d977d83744ecbdbaabb770ee.jpg

Jack

Ah I understand. So with my current code run, as the innotater is not working, the targets that I'm sending into the model is effectively the whole image (No bounding box). But interestingly, it is still able to perform abductive reasoning on the image with some reference to some of the objects that are present within. I suspect that this probably CLIP's inherent visual semantic understanding on the image.

One other way that I'm thinking, as mentioned in the README, is to use an object detector (Faster RCNN or YOLO) to auto extract the bboxes before converting them into the targets format. Would this work as well?

Adriel

adrielkuek commented 2 years ago

Just a separate comment @jmhessel. The current CLIP pretrained model is based on the sherlock corpus which I believe are from VCR and VG datasets. I see that in the demo notebook, we are using the val_split source for the clue-inference pairs to perform reasoning over. Just curious, should I have an out of domain dataset that I would like to test on, would it be possible to just generate a set of clue-inference pairs that have closer semantics to my target dataset, but still retaining the use of the original CLIP pretrained model. This is assuming that my OOD dataset do not largely differ from the VG+VCR domains which was originally trained on. Or would I be required to re-train CLIP from scratch using similar annotated examples on my target OOD dataset following the training code README?

jmhessel commented 2 years ago

Ah I understand. So with my current code run, as the innotater is not working, the targets that I'm sending into the model is effectively the whole image (No bounding box). But interestingly, it is still able to perform abductive reasoning on the image with some reference to some of the objects that are present within. I suspect that this probably CLIP's inherent visual semantic understanding on the image.

That's correct! Just following up --- is there an error message in your console nodejs? You had mentioned nothing comes up in the notebook, but I vaguely recall seeing a warning in the console that I needed to resolve before the annotator showed up.

The current CLIP pretrained model is based on the sherlock corpus which I believe are from VCR and VG datasets. I see that in the demo notebook, we are using the val_split source for the clue-inference pairs to perform reasoning over. Just curious, should I have an out of domain dataset that I would like to test on, would it be possible to just generate a set of clue-inference pairs that have closer semantics to my target dataset, but still retaining the use of the original CLIP pretrained model.

This is a good question! Perhaps we should break this off into a separate issue, but: I take it your new dataset is potentially over a different set of images? You could try it zero-shot (CLIP is surprisingly resilient in my experience); you could fine-tune the pre-trained Sherlock-CLIP models on your new corpus if it's big enough to support it; or you could start from the original CLIP weights, and fit both your data and the sherlock data from scratch. check out https://github.com/allenai/sherlock/tree/main/training_code for our the code we used to fine-tune from CLIP --> Sherlock-CLIP!

jmhessel commented 2 years ago

One other way that I'm thinking, as mentioned in the README, is to use an object detector (Faster RCNN or YOLO) to auto extract the bboxes before converting them into the targets format. Would this work as well?

You can do this, yep! We released the bounding boxes for all of the images from FasterRCNN with ResNeXt-101 backbone: see the note in the readme

jmhessel commented 2 years ago

I am going to close this for now, but feel free to re-open. I think this is an install issue related to jupyter notebook. I believe there's an error message that displays in the terminal that provides a hint (I remember having to install nodejs and I believe the error relates to that).

adrielkuek commented 2 years ago

Thanks @jmhessel for the follow up and assistance. I must apologise for my delayed response due to heavy work commitments. Regarding the nodejs issue, unfortunately, I couldn't get that error msg for the hint to try to debug jupyter_innotator. I've switched to using a generic object detector to extract bboxes to fit into the model. So this works for me! Thank you so much for your help!