deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
196 stars 64 forks source link

[python] Fix new logprobs computation in vllm_utils #2146

Closed sindhuvahinis closed 3 months ago

sindhuvahinis commented 3 months ago

Description

This bug is only in master, not in 0.28.0. This fixes the LMI no-code low code CI failures.

Even if we set logprobs=1, sometimes, vLLM sends more than one log probabilities. Here for new log probs, we add all the log probabilities that are return by vLLM to new_logprobs dict.

But when we determine whether it is last token or not, i == (len(new_logprobs) -1) and this fails, because now it has more than one new probs, this case will never be true. So last_token never occurred, so it returned broken json without any details. Hence the CI failed.

Will add unit test cases for this use-cases as well in the next PR.