NVIDIA-AI-IOT / nanoowl

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Apache License 2.0
252 stars 45 forks source link

Fewshot / Image-Conditioned Detection #28

Open aaronrmm opened 7 months ago

aaronrmm commented 7 months ago

This adds the Image-Conditioned Detection feature in the original OwlVit repo (sorta), in which you use example images of the objects to detect.

The difference between this and the original OwlVit feature is that you also include one or more text prompts with each query image to let the model find the correct embedding for the query image. The original Owlvit had utility functions with heuristics to find the best embedding automatically. I tried incorporating those but found this method much more reliable.