CentML / flexible-inference-bench

A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.
Apache License 2.0
5 stars 0 forks source link

Latency variable is not initialized in async_request_openai_completions #42

Open atokayev opened 4 months ago

atokayev commented 4 months ago

Hi,

The latency variable is used, but can be undefined: Initialization happens only under this condition:

if chunk == "[DONE]":
   latency = time.perf_counter() - st

But it gets referred to without this condition:

output.latency = latency

So in some cases, it leads to the following error:

Traceback (most recent call last):\n  File \"modular-inference-bench/src/modular_inference_benchmark/engine/backend_functions.py\", line 262, in async_request_openai_completions\n    output.latency = latency\n                     ^^^^^^^\nUnboundLocalError: cannot access local variable 'latency' where it is not associated with a value\n

https://github.com/CentML/modular-inference-bench/blob/536f5b83f49dc4b39736452c1ed60ebbd74b68a3/src/modular_inference_benchmark/engine/backend_functions.py#L258

andoorve commented 3 months ago

This is done right @atokayev ?

andoorve commented 3 months ago

@atokayev quick ping