Bug: Single threshold results in single label

The first example in your readme (https://github.com/NVIDIA-AI-IOT/nanoowl/tree/main#-usage) implies that calling predict with a single threshold should apply that threshold to each class. However, it seems using a single threshold causes the model to instead ignore all but the first class.

The example from the readme with a lower threshold for demonstration purposes:

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.01, text_encodings=None)

print(output)

results in OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 0, 0, 0]...

whereas

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=[0.01, 0.01], text_encodings=None)

print(output)

results in OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0]...

NVIDIA-AI-IOT / nanoowl

Bug: Single threshold results in single label #19