llama-inference-server Search Results

1000+ results
for llama-inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

meta-llama/llama-stack #190

stack tool cannot support large models with a .pth extension…

The stack tool cannot support large models with a .pth extension downloaded from Meta. It throws an error during runtime. Does it have to use models downloaded from Hugging Face? Is this setup unreaso…

Itime-ren updated 3 weeks ago
2
meta-llama/llama-stack #332

Ollama inference issue - llama3.2 not registered

### System Info Ubuntu, CPU only, Conda, Python 3.10 ### Information - [x] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug I am running a single node stack with …

akhtet updated 2 days ago
8
ollama/ollama #7148

runner crashes with more than 15 GPUs

### What is the issue? I have deployed ollama using the docker image 0.3.10. Loading "big" models fails. llama3.1 and other "small" models (e.g. codestral) fits into one GPU and works fine. llama3.1…

scriptbotprime updated 4 weeks ago
4
vllm-project/vllm #9134

Issue Running Inference on a Model with Multi Nodes and Infe…

### Your current environment ``` I'm attempting to run a multi-node, multi-GPU inference setup using vLLM with pipeline parallelism. However, I'm encountering an error related to the number of a…

kogans1107 updated 4 weeks ago
1
huggingface/text-generation-inference #2654

TGI does not support FP8 quantized models on ROCm

### System Info System Info TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm MODEL: meta-llama/Llama-3.1-405B-Instruct-FP8 Hardware used: Intel® Xeon® Platinum 8…

Bihan updated 2 weeks ago
5
ggerganov/llama.cpp #9964

llama.cpp Windows/ROCm builds are broken? Using shared GPU m…

### Discussed in https://github.com/ggerganov/llama.cpp/discussions/9960 Originally posted by **SteelPh0enix** October 20, 2024 I've been using llama.cpp w/ ROCm 6.1.2 on latest Windows 11 for…

SteelPh0enix updated 1 week ago
3
janhq/jan #3922

bug: Jan v0.5.7 on Mac M1 randomly stops responding to API r…

### Jan version 0.5.7 ### Describe the Bug Using Jan v0.5.7 on a Mac with an M1 processor, running Llama 3.2 3B instruct q8 via the API. Occasionally, the server stops responding to POST requ…

CrazybutSolid updated 2 days ago
4
triton-inference-server/server #7737

Running multi-gpu and replicating models

Currently have an LLM engine built on TensorRT-LLM. Trying to evaluate different setups and gains on types. Was trying to deploy the llama model on a multi-gpu, whereby between the 4 GPUs, I would hav…

JoJoLev updated 1 week ago
1
meta-llama/llama-stack #246

AttributeError: 'ChatCompletionResponse' object has no attri…

data_url = data_url_from_image("dog.jpg") print("The obtained data url is", data_url) iterator = client.inference.chat_completion( model=model, messages=[ { "role": "…

AI-Aether updated 2 weeks ago
2
mbzuai-oryx/LLaVA-pp #29

Finetuning with lora output never ends.

Hi, Thanks for your wonderful work. I am struggling using my lora tuned model. I conducted following steps 1. finetuning with lora - Undi95/Meta-Llama-3-8B-Instruct-hf model base - llama3 …

gyupro updated 3 weeks ago
5

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llama-inference-server

1000+ results
for llama-inference-server