deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
197 stars 64 forks source link

awscurl: Missing token metrics when -t option specified #2340

Open CoolFish88 opened 2 months ago

CoolFish88 commented 2 months ago

Description

When requesting token metrics from an endpoint running a LMI container using a vLLM engine, non-zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (as expected)

When requesting token metrics from an endpoint running a Triton Inference Server, zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (unexpected). The Triton endpoint was tested successfully to verify that it responds to individual as well as concurrent requests (produces the expected output given the input requests).

One difference between the two setups consists in the schema of the input requests and of the output response. Specifically, the Triton endpoint operates with a different input schema and produces an output structured differently from the LMI endpoint. Do you suspect this might be the reason of the token metrics not being computed?

Expected Behavior

Return token metrics when -t option is specified

Error Message

Zero valued token metrics

How to Reproduce?

TOKENIZER= ./awscurl -c 1 -N 10 -X POST -n sagemaker --dataset -H 'Content-Type: application/json' -P --connect-timeout 60

Triton input schema: {"inputs": [ {"name": str, "shape": [int], "datatype": str, "data: [str]} ] } Triton output schema: { "model_name": str, "model_version: str, "outputs":[ { "name": str, "shape": [int], "datatype": str, "data: [str] } ] }

frankfliu commented 2 months ago

You can use "-j" parameter to define jsonquery for tritonserver output, see this test code: https://github.com/deepjavalibrary/djl-serving/blob/master/awscurl/src/test/java/ai/djl/awscurl/AwsCurlTest.java#L455-L456

CoolFish88 commented 2 months ago

Hello Frank,

Thank you for the valuable suggestion. The dictionary storing the generated text (within the output array) looks like this:

{'name': 'generated_text', 'datatype': 'BYTES', 'shape': [1], 'data': ['{"response": "moon"}']},

Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?

Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)

All the best

CoolFish88 commented 2 months ago

I noticed that tokenThroughput = (totalTokens 1000000000d / totalTime clients). Could you please tell me the meaning behind the 1000000000d constant and what does tokenThroughput end up measuring?

frankfliu commented 2 months ago

The totalTime is in nano seconds, need to convert to seconds.

frankfliu commented 2 months ago

Hello Frank,

Thank you for the valuable suggestion. The dictionary storing the generated text (within the output array) looks like this:

{'name': 'generated_text', 'datatype': 'BYTES', 'shape': [1], 'data': ['{"response": "moon"}']},

Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?

Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)

All the best

I don't think we can parse the nested json string.

CoolFish88 commented 2 months ago

Thanks for the reply.

I also opened another awscurl bug ticket related to the tokenizer behavior. I would be very grateful if you could have a look.

All the best!

CoolFish88 commented 2 months ago

Hello Frank,

Token metrics are no longer computed when specifying a json query. Has something changed with the latest release?