Closed Ryan-ZL-Lin closed 1 month ago
Hey Ryan, thanks for your detailed logs. Can you try switching the service name from the openai-service one to the service created as part of your initial helm deployment (my-nim-nim-llm by default)? Want to isolate whether the issue is with the service or with the genai-perf.yaml configuration
Hi @Ryan-ZL-Lin, sorry for the late response. In the curl
command the model name is meta/llama3-8b-instruct
while the YAML is using meta/llama-3-8b-instruct
. Please try to use meta/llama3-8b-instruct
and see if it works
Thanks @JoeyTPChou , the problem is solved.
Hi first of all, thanks for this "awsome" repo that makes the integration between NVIDIA NIM and AWS EKS much easier. Based on this blog post, I'm able to setup NIM with EKS.
However, when performing genai-perf, I got HTTP code 404 and couldn't figure out where the problem is, even the API endpoint is tested beforehand. Is there anyone who could provide some hints to see how to address this issue better?
Here is process to reproduce the error:
A nodeport service created
NIM and genai-perf Pods are both scheduled without error
multi loras are hosted, hence, there are 1 base mode name and 4 lora model names:
API testing inside the genai-perf pod is successful
parameters to use in genai-perf pod
run genai-perf command inside genai-perf pod
root@genai-perf-5cdc688bb8-x45m9:/workspace# genai-perf -m ${MODEL_NAME} --service-kind openai --url openai-service:${LOCAL_PORTNUMBER} --endpoint v1/chat/completions --endpoint-type chat --concurrency ${concurrency} --num-prompts 100 --tokenizer ${TOKENIZER} --synthetic-input-tokens-mean $input_seq_len --synthetic-input-tokens-stddev 0 --streaming --extra-inputs max_tokens:$output_seq_len --extra-inputs ignore_eos:true --measurement-interval 4000 --generate-plots -v
404 error occured
Request concurrency: 50 Failed to retrieve results from inference request. Thread [0] had error: OpenAI response returns HTTP code 404
Thread [1] had error: OpenAI response returns HTTP code 404
Thread [2] had error: OpenAI response returns HTTP code 404
Thread [3] had error: OpenAI response returns HTTP code 404
Thread [4] had error: OpenAI response returns HTTP code 404
Thread [5] had error: OpenAI response returns HTTP code 404
Thread [6] had error: OpenAI response returns HTTP code 404
Thread [7] had error: OpenAI response returns HTTP code 404
Thread [8] had error: OpenAI response returns HTTP code 404
Thread [9] had error: OpenAI response returns HTTP code 404
Thread [10] had error: OpenAI response returns HTTP code 404
Thread [11] had error: OpenAI response returns HTTP code 404
Thread [12] had error: OpenAI response returns HTTP code 404
Thread [13] had error: OpenAI response returns HTTP code 404
Thread [14] had error: OpenAI response returns HTTP code 404
Thread [15] had error: OpenAI response returns HTTP code 404
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/genai_perf/main.py", line 143, in run args.func(args, extra_args) File "/usr/local/lib/python3.10/dist-packages/genai_perf/parser.py", line 570, in profile_handler Profiler.run(args=args, extra_args=extra_args) File "/usr/local/lib/python3.10/dist-packages/genai_perf/wrapper.py", line 139, in run subprocess.run(cmd, check=True, stdout=None) File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['perf_analyzer', '-m', 'meta/llama-3-8b-instruct', '--async', '--input-data', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/llm_inputs.json', '--endpoint', 'v1/chat/completions', '--service-kind', 'openai', '-u', 'openai-service:8000', '--measurement-interval', '4000', '--stability-percentage', '999', '--profile-export-file', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/profile_export.json', '--verbose', '-i', 'http', '--concurrency-range', '50']' returned non-zero exit status 99.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/genai_perf/main.py", line 154, in main run() File "/usr/local/lib/python3.10/dist-packages/genai_perf/main.py", line 147, in run raise GenAIPerfException(e) genai_perf.exceptions.GenAIPerfException: Command '['perf_analyzer', '-m', 'meta/llama-3-8b-instruct', '--async', '--input-data', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/llm_inputs.json', '--endpoint', 'v1/chat/completions', '--service-kind', 'openai', '-u', 'openai-service:8000', '--measurement-interval', '4000', '--stability-percentage', '999', '--profile-export-file', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/profile_export.json', '--verbose', '-i', 'http', '--concurrency-range', '50']' returned non-zero exit status 99. 2024-08-27 03:50 [ERROR] genai_perf.main:158 - Command '['perf_analyzer', '-m', 'meta/llama-3-8b-instruct', '--async', '--input-data', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/llm_inputs.json', '--endpoint', 'v1/chat/completions', '--service-kind', 'openai', '-u', 'openai-service:8000', '--measurement-interval', '4000', '--stability-percentage', '999', '--profile-export-file', 'artifacts/meta_llama-3-8b-instruct-openai-chat-concurrency50/profile_export.json', '--verbose', '-i', 'http', '--concurrency-range', '50']' returned non-zero exit status 99. root@genai-perf-5cdc688bb8-x45m9:/workspace#