llm-cpu Search Results - Githubissues

1000+ results
for llm-cpu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ARM-software/ML-examples #141

[Bug]"GGML_ASSERT: ……llama.cpp/ggml-kleidiai.cpp:444: n % nt…

I only modified t6 instead of t4, t4 t5 both work well for this model，but if we set the thread=6，will always trigger the problem on my XIAOMI14Pro(SM8650 8Gen3) please check it for resolve thanks~ …

AndreaChiChengdu updated 1 month ago
1
huggingface/ratchet #260

4 bit quantization support

I would like to use this library for in-browser web ml inference because with the upcoming CPU support it is better than 1. ggml.cpp(llama.cpp/whisper.cpp) - as it supports both CPU and GPU and can u…

bil-ash updated 3 weeks ago
1
abetlen/llama-cpp-python #1717

How to use this model?

llama_model_loader: loaded meta data with 32 key-value pairs and 219 tensors from /data/huggingface/hub/models--city96--t5-v1_1-xxl-encoder-gguf/snapshots/005a6ea51a7d0b84d677b3e633bb52a8c85a83d9/./t5…

dzy1128 updated 2 months ago
2
vllm-project/vllm #6160

[Bug]: Batch expansion doesn't work with lora

### Your current environment ```text GPU 0: NVIDIA H100 80GB HBM3 GPU 1: NVIDIA H100 80GB HBM3 GPU 2: NVIDIA H100 80GB HBM3 GPU 3: NVIDIA H100 80GB HBM3 GPU 4: NVIDIA H100 80GB HBM3 GPU 5: NV…

Adhyyan1252 updated 1 week ago
3
ollama/ollama #2929

Ollama only using half of available CPU cores with NUMA mult…

I just test using only cpu to lanch LLMs，however it only takes 4cpu busy 100% of the vmware, others still 0%

sddzcuigc updated 1 week ago
31
abetlen/llama-cpp-python #1535

llama-cpp-python not using GPU on colab

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [ ] I am running the latest code. Development is very rapid so there are no tagged versions as of…

amida47 updated 1 month ago
2
pytorch/pytorch #130530

Fail to offload FSDP model weights and optimizer states with…

### 🚀 The feature, motivation and pitch Hi Pytorch maintainers, I am currently engaged in training multiple large language models (LLMs) sequentially on a single GPU machine, utilizing FullShard…

PeterSH6 updated 1 week ago
3
vllm-project/vllm #8432

[Bug]: ValueError: could not broadcast input array from shap…

### Your current environment Collecting environment information... /home/miniconda3/envs/vllm/lib/python3.12/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: Unexpected error …

ndao600 updated 4 days ago
6
ollama/ollama #7148

runner crashes with more than 15 GPUs

### What is the issue? I have deployed ollama using the docker image 0.3.10. Loading "big" models fails. llama3.1 and other "small" models (e.g. codestral) fits into one GPU and works fine. llama3.1…

scriptbotprime updated 3 weeks ago
4
ggerganov/llama.cpp #9587

Bug: passing `tfs_z` crashes the server

### What happened? If you pass `tfs_z` param to the server, it crashes sometimes. Starting the server: ``` ~/test/llama.cpp/llama-server -m /opt/models/text/gemma-2-27b-it-Q8_0.gguf --verbose `…

z80maniac updated 1 week ago
2

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for llm-cpu

1000+ results
for llm-cpu