deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
182 stars 59 forks source link

[fix] initialize sequence dictionary for default sequence index to pr… #2072

Closed siddvenk closed 2 weeks ago

siddvenk commented 3 weeks ago

… in an output

Description

A previous change caused issues with a corner case. The issue was introduced by https://github.com/deepjavalibrary/djl-serving/commit/06eaa98518bb951ae1d7cad1e8dc6787082d382d#diff-4de063c68933067104c13e58e380b04bb922ca889f070e40bfa743ab33556426.

When we call LLMEngine.step() (through vllm or lmi-dist), it's not guaranteed that we receive and output for every active request. If on the first call to step for a request we do not generate a token, we never initialize the sequence dict that is part of TextGenerationOutput. Later in the flow we make a call to request.get_next_token https://github.com/deepjavalibrary/djl-serving/blob/1e883d5bae81a20189847cc4b2aff93ac38b8a38/engines/python/setup/djl_python/request.py#L140. For requests that did not generate a new token via step, we never call set_new_token, which is where the sequences property of TextGenerationOutput gets initialized.

The simples fix is to ensure that the sequences property of TextGenerationOutput always gets created for the default index so that we never try to access an array index that doesn't exist.

Fixes:

Note that for some reason the generated text for one of the inputs has changed. the outputs seem fine, but i cannot figure out why this changed

sindhuvahinis commented 2 weeks ago

Thanks Sid for this fix. A couple of questions: