Closed lucasavila00 closed 6 months ago
@lucasavila00, can you please provide an example with the python openai package? I was not yet able to reproduce it.
@EricLBuehler I could also reproduce it with a shell script parallel-calling, for instance, the regex script
python3 regex.py & python3 regex.py & python3 regex.py && fg && fg
Creating a different python script instead of regex also works, eg:
import openai
openai.api_key = "EMPTY"
openai.base_url = "http://localhost:1234/v1/"
completion = openai.chat.completions.create(
model="mistral",
messages=[
{
"role": "user",
"content": "Write a list of jokes. Return a markdown list where each item is a joke.",
}
],
)
print(completion.choices[0].message.content)
For me, it fails 50% of the time if I do 2 parallel requests. 3 parallel requests failed all the time.
I think that this is the problem area of the code. Its purpose is to make sure that all prompt seqs have the same length, but obviously it fails:
@lucasavila00, I was also able to reproduce the error even after #129.
This appears to be connected to adding prefill seqs, as the first one is added and then the next (rest) are added as prefill sequences. This is probably due to the off-by-one error causing #126.
I had cloned #129 and it was not working, even for a single request. I don't remember the exact commit.
It looks like https://github.com/EricLBuehler/mistral.rs/issues/126 if an off by one indeed.
@lucasavila00 , this should be fixed now.
Running requests concurrently manually makes it crash too.
The following script also triggers it: