llm-cpu Search Results - Githubissues

1000+ results
for llm-cpu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #208

Triton server is running, but no response returned.

The server seems to be ok with the following log. ``` I1212 03:29:51.067415 37860 server.cc:674] +----------------+---------+--------+ | Model | Version | Status | +----------------+---…

sleepwalker2017 updated 3 months ago
2
vllm-project/vllm #546

vLLM stops all processing when CPU KV cache is used, has to …

Hi The issue: with `--swap-space X` specified, as soon as CPU KV cache is used, vLLM stops all processing. CPU and GPU usage go to 0%, and the request never returns. Any future requests are also n…

TheBloke updated 2 months ago
11
NonStaticGH/CPathHostProject #5

Memory leaks

I found huge memory leaks, caused by plugin. To reproduce: 1) Add single CPathVolume 2) Add Timer in any blueprint to fire event connected to Find Async Path with Volume ref connected. To get l…

lihomanovdv updated 10 months ago
17
intel/intel-extension-for-pytorch #430

llama_int8 do not support do_sample=True

### Describe the bug with demo run_llama_int8.py, setting generate_kwargs["do_sample"] to be True, I got the error as follows: command: python run_llama_int8.py -m ${MODEL_ID} --quantized-model-…

markluofd updated 8 months ago
1
eosphoros-ai/DB-GPT #1667

[Bug] [llama3] error on sync to elastic

### Search before asking - [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues. ### Operating system information Linux ### P…

neulf updated 1 week ago
4
NVIDIA/TensorRT-LLM #1864

FP8 FMHA cannot be enabled on Pre-Hopper Arch in L40?

### System Info CPU x86_64 GPU NVIDIA L40 TensorRT branch: v0.10.0 CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 ### Who can help? @Tracin ### Inf…

activezhao updated 1 week ago
6
apache/pinot #10919

Vector embeddings support in Pinot

Creating this issue to initiate discussions about supporting vector embeddings in Pinot. This [write-up](https://docs.google.com/document/d/1aiXPbwK4rU_YdfMPt3K752SuCMy8KQehqM4ltPg9juE/edit) collat…

Aravind-Suresh updated 4 weeks ago
14
vllm-project/vllm #5535

[Bug]: Performance : very slow inference for Mixtral 8x7B In…

### Your current environment ```text Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu …

Syst3m1cAn0maly updated 2 weeks ago
4
ollama/ollama #5450

Inference fails on AMD when using >1 GPU.

### What is the issue? This is on AMD. I have 2 x Radeon 7900 XCX cards (24gb each). For models/memory use that only uses 1 GPU, everything works fine. As soon as both cards are required, the inf…

Speedway1 updated 9 hours ago
3
vllm-project/vllm #5035

[Bug]: 英伟达最新驱动555.85，vllm运行报错

`2024-05-24 23:49:38 WARNING 05-24 15:49:38 utils.py:327] Not found nvcc in /usr/local/cuda. Skip cuda version check! 2024-05-24 23:49:38 INFO 05-24 15:49:38 config.py:379] Using fp8 data type to sto…

gaye746560359 updated 1 month ago
7

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for llm-cpu

1000+ results
for llm-cpu