OpenAdaptAI / OpenAdapt

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
745 stars 99 forks source link

Implement OWL-ViT: Open-World Object Detection with Vision Transformers #692

Open abrichr opened 1 month ago

abrichr commented 1 month ago

Feature request

We wish to implement https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit

Motivation

google's version of SAM

abrichr commented 1 month ago

https://colab.research.google.com/drive/1NVcfvtESDfHC-nXi7L-gTAM-MCpixrvs#scrollTo=99ilV_T2RyNT (copy of https://colab.research.google.com/github/google-research/scenic/blob/main/scenic/projects/owl_vit/notebooks/OWL_ViT_minimal_example.ipynb#scrollTo=99ilV_T2RyNT):

image