dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.45k stars 480 forks source link

The following `model_kwargs` are not used by the model: ['tokenizer'] #1237

Open dzimmerman-nci opened 2 weeks ago

dzimmerman-nci commented 2 weeks ago

Describe the issue as clearly as possible:

During the generate process, the tokenizer is being sent to the transformers model.generate() function as a kwarg, which is then being caught during their validation of model_kwargs. Why is tokenizer being added to this call? Is there a specific version of transformers required for this?

It's being add in _get_generation_kwargs at the end of the function: return dict( logits_processor=logits_processor_list, generation_config=generation_config, tokenizer=self.tokenizer.tokenizer, ) What am I missing here?

Steps/code to reproduce the bug:

import outlines
model = outlines.models.transformers(model_name="../Meta-Llama-3-8B-Instruct", device="auto")

sys_text = "You are a sentiment-labelling assistant. Is the following review positive or negative?"
user_text = "Review: This restaurant is just awesome!"

messages = [{"role":"system", "content":sys_text}]
messages.append({"role":"user", "content":user_text})
llama_prompt = model.tokenizer.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(llama_prompt )

Expected result:

Positive

Error message:

ValueError: The following `model_kwargs` are not used by the model: ['tokenizer']

Outlines/Python version information:

Version information

``` python==3.11.2 outlines==0.1.1 transformers==4.38.2 ```

Context for the issue:

No response

duarteocarmo commented 4 days ago

@dzimmerman-nci - I was also seeing this - upgraded transformers and seems to be working fine now.