-
In LC Corona/Ruby:
- export DFTRACER_TRACE_COMPRESSION=1
- export DFTRACER_LOG_LEVEL=ERROR
- DLIO benchmark with DFTRACER after run does not produce results for AU (Accelerator Utilization)
- e…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Description
Feature description https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prov…
-
### Your current environment
启动方式:python -m vllm.entrypoints.openai.api_server --model /opt/llm_models/Qwen1.5-32B-Chat-GPTQ-Int4 --quantization gptq --max-model-len 16384 --port 8888 --gpu-memory-ut…
-
tl;dr
- the "Requests Per Minute" chart on the Requests landing page (and domain summary page?) should be split up by HTTP operation. i.e., 5 lines, for HTTP GET, HTTP POST, etc.
- the "Queries Per M…
-
Dear community,
I have been able to test the srsRAN project with the oran-sc-ric. The KPM xApps work fine. I am trying to check the functionality of the control xApps likes simple_xApp.py. This xApp …
-
hi, I am trying to reproduce the vllm inference throughput of qwen1.5-7b, which is 2298.89 tokens/s reported in your blog.
I set the number of input tokens to 1000 and "min_tokens" to 1000 using a si…
-
### Your current environment
Using latest available docker image: vllm/vllm-openai:v0.5.0.post1
### 🐛 Describe the bug
I am getting as response "Internal Server Error" when calling the /v1/embedd…
-
In your paper, you said the max throughput improves about 2.2x.
But how to reproduce it using your github codes? Can you give me some detailed instructions?
-
Should be completed after https://github.com/mila-iqia/mila-docs/issues/247
Now that we have an example of how to benchmark the throughput and identify bottlenecks in the mila-docs, the research pro…
-
After reading https://github.com/scalalang2/go-cache-benchmark/issues/4 , especially @maypok86’s comment. I think below points should be clarify for fairness
1. CPU cores choosing (8/16 cores or high…