llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #468

`random_seed` seems to be ignored (or at least inconsistent)…

### System Info I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input,…

dyoshida-continua updated 4 weeks ago
4
InternLM/lmdeploy #1647

[Feature] A series of various optimization points

### Motivation When we use LMDeploy for Serving, although throughput is also a concern, **more emphasis is placed on throughput under latency constraints with different QPS**. This is a performance m…

zhyncs updated 1 month ago
6
kserve/kserve #3632

Custom ClusterServingRuntime not being selected based on mod…

/kind bug I created a [ClusterServingRuntime](https://github.com/supertetelman/nim-kserve/blob/main/runtimes/24.01-nim_llm.yaml) that looks like this: ``` apiVersion: serving.kserve.io/v1alpha1…

supertetelman updated 2 months ago
2
kserve/kserve #3606

Support overriding model mount path in model server containe…

/kind feature **Describe the solution you'd like** Currently it is not possible to specify at what path the downloaded model should be available in the model server container. The downloaded model…

cmaddalozzo updated 2 months ago
3
NVIDIA/nim-deploy #24

Which path is actually used for model caching?

Hi team, I am working on NIM deploy on Amazon EKS pattern. ref: https://github.com/awslabs/data-on-eks/issues/560 I tried to deploy the NIM container with helm chart, and I am using a shared st…

hustshawn updated 6 days ago
1
vllm-project/vllm #4554

[Usage]: Experiencing weird import bugs and errors after ins…

### Your current environment ```Collecting environment information... Traceback (most recent call last): File "/home/yangzhiyu/workspace/open-long-agent/collect_env.py", line 721, in main()…

KevinCL16 updated 1 day ago
5
kubeflow/arena #1077

[BUG] serving pod launched by Arena is not handling SIGTERM …

I'm running Kserve serving with arena, with the following command: ``` arena serve kserve \ --name=qwen \ --image=vllm/vllm-openai:0.4.1 \ --gpus=1 \ --cpu=4 \ --memory=20Gi…

TrafalgarZZZ updated 1 week ago
1
intel-analytics/ipex-llm #11134

ModuleNotFoundError: No module named 'ipex_llm.vllm.xpu' whi…

followed installation of Vllm via this [link](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html) tried running via docker too: Here is the [image](https://hub.docke…

sachinchandra updated 1 month ago
1
redhat-et/foundation-models-for-documentation #48

Optimizing LLMs for max performance when serving on ODH

What is the resource requirement of the deployed model? Explain the resources defined for the model pod. What is the throughput of the model? How can we increase the throughput? Given a combinat…

codificat updated 7 months ago
1
vllm-project/vllm #5060

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Ba…

### Your current environment docker image: vllm/vllm-openai:0.4.2 Model: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ GPUs: RTX8000 * 2 ### 🐛 Describe the bug The model works f…

heungson updated 6 days ago
17

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving