dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.57k stars 492 forks source link

Add transformers vision cookbook with atomic caption flow #1216

Closed fearnworks closed 1 month ago

fearnworks commented 1 month ago

Request received in discord to add an example for the new transformers vision capability.

Vision-Language Models with Outlines

This guide demonstrates how to use Outlines with vision-language models, leveraging the new transformers_vision module. Vision-language models can process both text and images, allowing for tasks like image captioning, visual question answering, and more.

We will be using the Pixtral-12B model from Mistral to take advantage of some of its visual reasoning capabilities and a workflow to generate a multistage atomic caption.

rlouf commented 1 month ago

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

fearnworks commented 1 month ago

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

Updated!

rlouf commented 1 month ago

Thank you so much for your contribution!