llm-cpu Search Results - Githubissues

1000+ results
for llm-cpu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1833

Failed to run convert_checkpoint.py with int8 weight-only qu…

### System Info CPU Architecture: x86_64 CPU/Host memory size: 1024Gi (1.0Ti) GPU properties: GPU name: NVIDIA GeForce RTX 4090 GPU mem size: 24Gb…

frontword updated 2 weeks ago
10
ggerganov/llama.cpp #8294

Bug: [SYCL] Inference not working correctly on multiple GPUs

### What happened? I am using Llama.cpp + SYCL to perform inference on a multiple GPU server. However, I get a Segmentation Fault when using multiple GPUs. The same model can produce inference output…

ch1y0q updated 4 days ago
3
triton-inference-server/tensorrtllm_backend #462

How to deploy one model instance across multiple GPUs to tac…

I am trying to deploy a Baichuan2-7B model on a machine with 2 Tesla V100 GPUs. Unfortunately each V100 has only 16GB memory. I have applied INT8 weight-only quantization, so the size of the engine I…

shil3754 updated 4 days ago
8
ollama/ollama #2564

Ollama crashes on CUBLAS_STATUS_NOT_SUPPORTED While loading …

I just upgraded to the latest ollama to verify the issue and it it still present on my hardware I am running version 0.1.25 and trying to run the falcon model Warning: could not connect to a ru…

keesj-riscure updated 2 weeks ago
4
intel-analytics/ipex-llm #11578

Does IPEX-LLM support Flash Attention ?

Hi, i encounter the following error message trying to enable flash attention when running the command below. Can i know is flash attention supported ? ``command: ./main -m $model -n 128 --prompt …

wallacezq updated 5 days ago
3
Unstructured-IO/unstructured #3326

CPU only installation

I've been using unstructured for a while in a 100% cpu machine. I've noticed a lot of nvidia files (+2gb) in my venv folder coming from PyTorch (possible one of unstructured's dependencies). Can I in…

arthurbrenno updated 2 weeks ago
2
microsoft/onnxruntime #20896

OpenCL and Mali GPU support left out of all execution provid…

I was trying to migrate from MLC-LLM to onnxruntime to run Phi-3 on an Orange Pi 5 but I realize that among ALL your execution providers there isn't a single one that takes advantage of the GPU or NPU…

federicoparra updated 3 weeks ago
2
intel-analytics/ipex-llm #11409

ImportError: undefined symbol: iJIT_NotifyEvent on 2-ARC GPU

Trying to do inference on arc GPU machine, have followed this guidelines: ``` https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Pipeline-Parallel-Inference and run_mi…

raj-ritu17 updated 3 weeks ago
1
NVIDIA/TensorRT-LLM #1694

LLama3 sq(per_token + per_channel) build failed on main bran…

### System Info - CPU architecture: x86_64 - GPU properties - GPU name: NVIDIA A100 - GPU memory size: 40G - Libraries - TensorRT-LLM branch or tag: main - TensorRT-LLM commit: 5d8ca2…

NaNAGISaSA updated 1 month ago
5
ollama/ollama #5668

Glm4 in ollama v0.2.3 still returns gibberish G's

### What is the issue? After running for a while, the model still returns gibberish: ``` [12:59:39] [INFO] [Part of Speech Determination] [Fixed] JSON string: Since you did not provide specific con…

loveyume520 updated 1 day ago
11

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for llm-cpu

1000+ results
for llm-cpu