llm-cpu Search Results - Githubissues

1000+ results
for llm-cpu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

opea-project/GenAIExamples #881

[Bug] vllm microservice not working not xeon

### Priority Undecided ### OS type Ubuntu ### Hardware type Xeon-SPR ### Installation method - [X] Pull docker images from hub.docker.com - [ ] Build docker images from source ### Deploy metho…

srinarayan-srikanthan updated 1 week ago
1
triton-inference-server/tensorrtllm_backend #577

Unable to launch triton server with TP

### System Info Built tensorrtllm_backend from source using dockerfile/Dockerfile.trt_llm_backend tensorrt_llm 0.13.0.dev2024081300 tritonserver 2.48.0 triton image: 24.07 Cuda 12.5 ### Wh…

dhruvmullick updated 1 month ago
2
NVIDIA/TensorRT-LLM #2195

❗ Phi3-Visual: Incorrect outputs

### System Info Hello TensorRT-LLM team! 👋 I'm facing an issue where the inference output does not contain the expected "Singapore" text. Below are the details of my setup and steps to reproduce the …

eoastafurov updated 1 week ago
9
ROCm/ROCR-Runtime #177

High CPU usage from rdtsc during LLM inference

`top` reports 100% single-core CPU usage when inferring LLMs both with exllamav2 and llama.cpp. Digging with `perf`, it seems this loads is coming from `libhsa-runtime64`, specifically, from a sing…

jdecourval updated 1 month ago
1
SNU-ARC/any-precision-llm #7

No real speedup from any-precision-llm kernels

Hello, Similarly to #3, I've tried reproducing the `demo.py` benchmark on an H100 and an A6000 and I'm also seeing no speedup on these platforms at lower precisions. It was mentioned this is du…

pgimenes updated 1 week ago
2
AIDC-AI/Ovis #22

If it is possible to run inference with OVIS 1.6 on a single…

Could anyone please advise if it is possible to run inference with OVIS 1.6 on a single 4090 GPU? After loading the model, it appears to consume approximately 20GB of VRAM. I attempted an inference, b…

Raven625 updated 6 days ago
4
opendilab/LMDrive #69

LLM权重放在哪里呢？

你好：我观察到yaml里有llm_model，这在您的说明中表示llm_model的ckpt，但是这/data/llava-v1.5-7b似乎是一个文件夹，从代码来看似乎需要tokenizer和llm的ckpt self.llm_tokenizer = LlamaTokenizer.from_pretrained(llm_model, use_fast=False, truncati…

1476732094 updated 1 month ago
1
abetlen/llama-cpp-python #1624

CUDA error: unspecified launch failure on inference on Nvidi…

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x ] I am running the latest code. Development is very rapid so there are no tagged versions as o…

rplescia updated 4 days ago
3
NVIDIA/TensorRT-LLM #2163

How to get output including context_logits with GPU tensors?

``` llm = LLM('/app/models/tensorrt_llm', skip_tokenizer_init=True) sampling_params = SamplingParams(end_id=2, return_context_logits=True, max_new_tokens=1) results = llm.generate([[32, 12,24,54,6,…

lkm2835 updated 5 days ago
1
janhq/cortex.cpp #1104

bug: Cortex-cpp continues to have 1 layer offload to CPU whi…

**Describe the bug** When generating responses using a local llm, cortex-cpp still seems to use CPU. https://discord.com/channels/1107178041848909847/1149558035971321886/1253148982188838954 **To …

Van-QA updated 3 weeks ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llm-cpu

1000+ results
for llm-cpu