Open akhilrazdan opened 12 months ago
Hi there. The error message suggests there may be an issue with your installation of bitsandbytes or transformers. Maybe this helps: https://github.com/oobabooga/text-generation-webui/issues/2397
Hi, here's some more info suggestion the problem is with the max length of transformers model getting exceeded (similar to related but non-LMQL posts on this error: https://stackoverflow.com/questions/62081155/pytorch-indexerror-index-out-of-range-in-self-how-to-solve and https://discuss.huggingface.co/t/adding-new-tokens-indexerror-index-out-of-range-in-self/6731)
Here's what I do: first of all, I made a clean conda install with python 3.10.13 and 'pip install lmql[hf]'. Then, I start a server locally with
lmql serve-model
Then I try two things:
import lmql
@lmql.query
def simple_question(question):
'''lmql
"The answer is: [ANSWER]."
return ANSWER
'''
answer = simple_question("What is the meaning of life?", model="gpt2", temperature=0.5)
This generates the error, but the server outputs "This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all."
I can retry and set the maximum length
answer = chain_of_thought("What is the meaning of life?", model="gpt2", temperature=0.5, max_len=50)
but this will generate an (expected) AssertionError:
AssertionError: The decoder returned a sequence that exceeds the provided max_len (max_len=50, sequence length=50). To increase the max_len, please provide a corresponding max_len argument to the decoder function.
So this route ends here for me, as the model will ultimately generate over its own max length and then raise the Index Error. Perhaps there is a way I could solve it myself, but it could also hint at a potential issue with either Transformers or LMQL.
So the other route:
prompt = """
"Greet the user in four different ways: [GREETINGS]" \
where len(TOKENS(GREETINGS)) < 10
"""
m: lmql.LLM = lmql.model("gpt2")
m.generate_sync(prompt)
I get the same error, where despite the max token length constraint, the model will still keep generating. However, this time I can change the call to
m.generate_sync(prompt, max_tokens=10)
which works perfectly fine! However, the 'max_tokens' parameter is not available in the decorated query afaik and of course we want to specify such a constraint in the query itself!
Btw, I also tried with "gpt2-medium" which behaves the same. Did not try other models.
Hope this helps,
I am trying to run the following and getting an error
This is throwing an error:
And the cell takes forever to return. Any pointers as to what could be happening?