AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
745
stars
99
forks
source link
Implement OWL-ViT: Open-World Object Detection with Vision Transformers #692
Open
abrichr opened 1 month ago
Feature request
We wish to implement https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit
Motivation
google's version of SAM