This bug is only in master, not in 0.28.0. This fixes the LMI no-code low code CI failures.
Even if we set logprobs=1, sometimes, vLLM sends more than one log probabilities. Here for new log probs, we add all the log probabilities that are return by vLLM to new_logprobs dict.
But when we determine whether it is last token or not, i == (len(new_logprobs) -1) and this fails, because now it has more than one new probs, this case will never be true. So last_token never occurred, so it returned broken json without any details. Hence the CI failed.
Will add unit test cases for this use-cases as well in the next PR.
Description
This bug is only in master, not in 0.28.0. This fixes the LMI no-code low code CI failures.
Even if we set logprobs=1, sometimes, vLLM sends more than one log probabilities. Here for new log probs, we add all the log probabilities that are return by vLLM to new_logprobs dict.
But when we determine whether it is last token or not,
i == (len(new_logprobs) -1)
and this fails, because now it has more than one new probs, this case will never be true. So last_token never occurred, so it returned broken json without any details. Hence the CI failed.Will add unit test cases for this use-cases as well in the next PR.