fearnworks commented 1 month ago

Request received in discord to add an example for the new transformers vision capability.

Vision-Language Models with Outlines

This guide demonstrates how to use Outlines with vision-language models, leveraging the new transformers_vision module. Vision-language models can process both text and images, allowing for tasks like image captioning, visual question answering, and more.

We will be using the Pixtral-12B model from Mistral to take advantage of some of its visual reasoning capabilities and a workflow to generate a multistage atomic caption.

rlouf commented 1 month ago

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

fearnworks commented 1 month ago

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

Updated!

rlouf commented 1 month ago

Thank you so much for your contribution!

dottxt-ai / outlines

Add transformers vision cookbook with atomic caption flow #1216

Vision-Language Models with Outlines