NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
1.78k stars 140 forks source link

Can VILA do grounding jobs? #128

Open PredyDaddy opened 2 weeks ago

PredyDaddy commented 2 weeks ago

Hello, I ask vila for giving the bbox of the objects in the photo and vila do reply me the bbox. Then I used code see if it is correct.

from PIL import Image, ImageDraw

# open image
image_path = 'cup.jpg'
image = Image.open(image_path)
draw = ImageDraw.Draw(image)

# bbox vila reply
normalized_bbox = [0.34, 0.7, 0.46, 0.78]

# denormalized
image_width, image_height = image.size
bbox = [
    normalized_bbox[0] * image_width,
    normalized_bbox[1] * image_height,
    normalized_bbox[2] * image_width,
    normalized_bbox[3] * image_height
]

# draw bounding box
draw.rectangle(bbox, outline="red", width=3)

# save figure
output_path = './cup_with_bbox.jpg'
image.save(output_path)

I can see the bbox with the correct size but with the wrong center point. How can I turn the normalized_bbox to the photo?

Lyken17 commented 2 weeks ago

@Seerkfang may know more the details.