dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.57k stars 490 forks source link

[Model Support] Pixtral model support #1272

Closed mlinmg closed 2 days ago

mlinmg commented 4 days ago

Describe the issue as clearly as possible:

When using Pixtral outlines gives an error, specifically in vllm. The problems seems that mistral uses a different tokenizer (Tekken)

Steps/code to reproduce the bug:

from vllm import LLM

llm = LLM(
        model='mistralai/Pixtral-12B-2409,
        tokenizer_mode='mistral',
        max_model_len=3000,
        max_num_seqs=1,
        gpu_memory_utilization=0.9)
sampling_params = sampling_params = SamplingParams(
                        temperature=temp,
                        max_tokens=config.max_tokens,
                        guided_decoding=GuidedDecodingParams(
                            json=[insert a valid json schema]
                        )
                    )

# prepare the messages

llm.chat(messages, sampling_params)

Expected result:

Some output following the schema

Error message:

No response

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

rlouf commented 3 days ago

@cpfiffer didn't you use Pixtral with Outlines?

cpfiffer commented 3 days ago

I did! @mlinmg can you look at this cookbook to see if that helps?

https://dottxt-ai.github.io/outlines/main/cookbook/receipt-digitization/

mlinmg commented 2 days ago

Yes sorry I found out about pixtral community later.