boxes, logits, phrases = predict(
model=model,
image=image,
caption=TEXT_PROMPT,
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
My assumption is boxes contains the [nq,4] entries. CenterX, CenterY, W, H in the normalized one in the form of values 0 to 1.
import fiftyone as fo
detections = []
for box, score, label in zip(boxes, scores, phrases):
box = [round(i, 2) for i in box.tolist()]
detections.append(
fo.Detection(
label=label,
bounding_box=box,
confidence=score
)
)
# Save predictions to dataset
sample["gdino_pred"] = fo.Detections(detections=detections)
sample.save()
In the voxel51 tool when i see the output there is a change in the position.
attached the sample image as well
what is the reason and how to change this problem?
for OWL-ViT model output we could visualize it well.
only i have problem with Grounding Dino model output.
I used BDD100k image data as input for this ( 1280x720 RGB image)
Hi Team,
Thanks for your help.
In the voxel51 tool when i see the output there is a change in the position.
attached the sample image as well what is the reason and how to change this problem? for OWL-ViT model output we could visualize it well.
only i have problem with Grounding Dino model output.
I used BDD100k image data as input for this ( 1280x720 RGB image)
waiting for your response.