NVIDIA-AI-IOT / nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Apache License 2.0
252 stars 45 forks source link

Bug: Single threshold results in single label #19

Open aaronrmm opened 8 months ago

aaronrmm commented 8 months ago

The first example in your readme (https://github.com/NVIDIA-AI-IOT/nanoowl/tree/main#-usage) implies that calling predict with a single threshold should apply that threshold to each class. However, it seems using a single threshold causes the model to instead ignore all but the first class.

The example from the readme with a lower threshold for demonstration purposes:

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.01, text_encodings=None)

print(output)

results in OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 0, 0, 0]...

whereas

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=[0.01, 0.01], text_encodings=None)

print(output)

results in OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0]...