gereleth / jupyter-bbox-widget

A Jupyter widget for annotating images with bounding boxes
BSD 3-Clause "New" or "Revised" License
119 stars 19 forks source link

Adding point labels. #33

Closed ovalerio closed 2 days ago

ovalerio commented 3 days ago

Hello @gereleth,

Thanks for putting together this tool. I started using it for labeling a few selected frames and then running SAM2 video segmentation. Using bounding boxes is already a good start, however I also noticed that I can get an even better segmentation from SAM2 when I refine the bounding box label with a few positive points. (like in my image example)

image

I ran a few tests using an image of a worm head.

I have previously annotated the frames using the bbox widget and then just hard-coded the points. Of course, this will only work when the worm head is parallel to the horizontal axis. I would rather prefer to add the point labels using a modified version of your bbox widget.

for k, v in annotations.items():
    ann_frame_index = int(k.split(".")[0])

    ann_obj_id = 0
    # NOTE: sam2 uses the following format for the bounding boxes: [x0, y0, x1, y1]
    box = [[ann["x"], ann["y"], ann["x"] + ann["width"], ann["y"] + ann["height"]] for ann in v]
    # assuming only one object (worm) in the frame
    box = box[0]

    # ***  ADDING A POINT TO REFINE THE BOX LABEL ***
    # for this example the point is on the right side of the bounding box
    # I pass the point together with the bounding box to the inference state
    # NOTE: for point masks a '1' is a positive label and '0' is background
    point = [box[2] + 10, (box[1] + box[3]) // 2] 
    points = np.array([point], dtype=np.float32)
    labels = np.array([1], dtype=np.int64)

    # predicting the object mask using the bounding box AND point labels
    _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
        inference_state=inference_state,
        frame_idx=ann_frame_index,
        obj_id=ann_obj_id,
        box=box,
        points=points,
        labels=labels,
    )

    # Display image, with labels and SAM2 mask
    img = Image.open(os.path.join(video_dir, k))
    fig, ax = plt.subplots(1, 1, figsize=(6, 6))
    plt.imshow(img, cmap="viridis")
    show_box(box, ax)
    show_points(points, labels, ax)

Can you guide me how can I extend the bbox widget for this purpose?

Thanks!

gereleth commented 3 days ago

Hi! It's really interesting to see what people are using this widget for =).

Could you maybe just use tiny boxes as points? If you click and don't drag then that basically creates a zero-sized bbox - very much like a point. So if you use labels=['box', 'point'] you could create both types of annotations. And then you distinguish them by label like this:

box = [
    [ann["x"], ann["y"], ann["x"] + ann["width"], ann["y"] + ann["height"]] 
    for ann in v if ann["label"] == "box"
]
points = [[ann["x"], ann["y"]] for ann in v if ann["label"] == "point"]

image

The actual point you click is 3 pixels below the lower right corner of the x symbol if you need that kind of accuracy =).

ovalerio commented 3 days ago

Thanks!! That's very clever. =)

image

I added a Prev button to the widget and I'm flying through the frames :racing_car:

Is there is some kind of onLoad event that I can use to load the box annotations for the very first frame? So far, I am using the skip function for that, but it only works if I press the Skip or Submit buttons. If I want to see the existing annotations for the first image I have to press Skip followed by Prev.

@w_bbox.on_skip
def skip():
    w_progress.value += 1
    # open new image in the widget
    image_file = selected_frames[w_progress.value]
    w_bbox.image = os.path.join(video_dir, image_file)
    # read the existing bbox annotations for the current image in annotations
    if image_file in annotations:
        w_bbox.bboxes = annotations[image_file]
    else: # if no annotations are found we assign an empty list
        w_bbox.bboxes = []
gereleth commented 3 days ago

If you're initializing the widget with the first image then you can supply the first set of bboxes at the same time:

# assuming `video_dir` and `annotations` are defined before
image_file = '001.png'
widget = BBoxWidget(
    image = os.path.join(video_dir, image_file),
    bboxes = annotations.get(image_file, []),
    labels = ['worm', 'point'],
)

You could also maybe simplify your code by adding an observe on the image trait of the widget. This function will run every time widget.image changes and load the corresponding annotations (if there are any). Then you won't have to repeat the code to load annotations in skip/previous. And you should see correct annotations even if you manually load an image out of sequence.

def on_image_change(change):
    image_file = os.path.basename(widget.image)
    widget.bboxes = annotations.get(image_file, [])
widget.observe(on_image_change, names=['image'])

You could also lean even more into "observe"-based reactivity and observe the value of w_progress, using that value to load current image. Then your skip function can just be w_progress.value += 1 and everything else will happen in reaction =). And the previous function is the same just with -= 1. (ok, don't let me get carried away with this :joy: )

ovalerio commented 2 days ago

Thanks!! I am starting to like this "observe" thing. your last comment remind me the movie Inception.. leap of faith.. trains that take you far away.. reactivity .. :ninja: :test_tube: