Open CoolFish88 opened 2 months ago
You can use "-j" parameter to define jsonquery for tritonserver output, see this test code: https://github.com/deepjavalibrary/djl-serving/blob/master/awscurl/src/test/java/ai/djl/awscurl/AwsCurlTest.java#L455-L456
Hello Frank,
Thank you for the valuable suggestion. The dictionary storing the generated text (within the output array) looks like this:
{'name': 'generated_text', 'datatype': 'BYTES', 'shape': [1], 'data': ['{"response": "moon"}']},
Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?
Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)
All the best
I noticed that tokenThroughput = (totalTokens 1000000000d / totalTime clients). Could you please tell me the meaning behind the 1000000000d constant and what does tokenThroughput end up measuring?
The totalTime is in nano seconds, need to convert to seconds.
Hello Frank,
Thank you for the valuable suggestion. The dictionary storing the generated text (within the output array) looks like this:
{'name': 'generated_text', 'datatype': 'BYTES', 'shape': [1], 'data': ['{"response": "moon"}']},
Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?
Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)
All the best
I don't think we can parse the nested json string.
Thanks for the reply.
I also opened another awscurl bug ticket related to the tokenizer behavior. I would be very grateful if you could have a look.
All the best!
Hello Frank,
Token metrics are no longer computed when specifying a json query. Has something changed with the latest release?
Description
When requesting token metrics from an endpoint running a LMI container using a vLLM engine, non-zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (as expected)
When requesting token metrics from an endpoint running a Triton Inference Server, zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (unexpected). The Triton endpoint was tested successfully to verify that it responds to individual as well as concurrent requests (produces the expected output given the input requests).
One difference between the two setups consists in the schema of the input requests and of the output response. Specifically, the Triton endpoint operates with a different input schema and produces an output structured differently from the LMI endpoint. Do you suspect this might be the reason of the token metrics not being computed?
Expected Behavior
Return token metrics when -t option is specified
Error Message
Zero valued token metrics
How to Reproduce?
TOKENIZER= ./awscurl -c 1 -N 10 -X POST -n sagemaker --dataset -H 'Content-Type: application/json' -P --connect-timeout 60
Triton input schema: {"inputs": [ {"name": str, "shape": [int], "datatype": str, "data: [str]} ] } Triton output schema: { "model_name": str, "model_version: str, "outputs":[ { "name": str, "shape": [int], "datatype": str, "data: [str] } ] }