llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #563

Triton crashes on boot

### System Info - Hardware: 8x NVIDIA H100 80GB HBM3 - Software: NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.4 - tensorrtllm_backend commit: [d173386f4dd7b3ed5…

daulet updated 1 month ago
1
intel-analytics/ipex-llm #11559

OOM on multiple-ARC with vllm serving

Run vllm serving test on ARC with below issue: NFO 07-04 19:10:08 async_llm_engine.py:152] Aborted request cmpl-e5fb5cad96e9402dabbbece3611ae22f-0. INFO: 127.0.0.1:41772 - "POST /v1/completions …

jessie-zhao updated 2 months ago
1
confident-ai/deepeval #731

Show me the prompt!

As discussed on Discord, we need to know what prompts you are serving the evaluation LLM. https://hamel.dev/blog/posts/prompt/ I need to see the prompt to help debug when the framework fails or …

prescod updated 5 months ago
2
vllm-project/vllm #5060

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Ba…

### Your current environment docker image: vllm/vllm-openai:0.4.2 Model: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ GPUs: RTX8000 * 2 ### 🐛 Describe the bug The model works f…

heungson updated 1 week ago
39
NVIDIA/TensorRT-LLM #1704

24.05-trtllm-python-py3 image size

Hello, I'm using 24.03-trtllm-python-py3 with image size 8.38 GB which is not small but ok. I'm going to migrate to the newest versions like 24.04 or 24.05 but it size drastically increased to 18.46 …

Prots updated 1 week ago
9
ollama/ollama #5016

Integration with MLFlow

Hey, Currently, Ollama is saving models locally on a cache. To maintain different versions of LLMs or finetuned ones and also for extensive monitoring it's a good idea to provide integration with M…

ulhaqi12 updated 3 months ago
1
vllm-project/vllm #6766

[Bug]: `pt_main_thread` processes are not killed after main …

### Your current environment ```text PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC …

oandreeva-nv updated 1 week ago
3
eth-sri/lmql #143

Integrate vLLM

Are you planning to integrate the vLLM package for fast LLM inference and serving? https://vllm.readthedocs.io/en/latest/

ChezzPlaya updated 8 months ago
13
ModelCloud/GPTQModel #331

[BUG] gemma-2-9b-it-gptq-4bit vllm oom

**Describe the bug** gemma-2-9b-it-gptq-4bit CUDA OOM on RTX 3090 **GPU Info** ``` Sun Aug 4 02:35:35 2024 +-----------------------------------------------------------------------…

wciq1208 updated 2 months ago
2
intel-analytics/ipex-llm #11739

win11运行ipex报错：AMX state allocation in the OS failed

### win11专业版下安装wsl2，wsl下安装docker desktop，在镜像中运行pytorch代码报错启动镜像命令 ``` docker run -itd --privileged --device=/dev/dri -v /c//models:/llm/models -v /usr/lib/wsl:/usr/lib/wsl --name=arc_vllm --s…

showyouit updated 1 month ago
3

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving