llama-inference-server Search Results

microsoft/autogen #3504

[Issue]: how autogen support multiple local model.

### Describe the issue according to the [Local-LLMs/](https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/), the autogen can support multiple local llm. my command for fastchat First,…

lambda7xx updated 4 days ago

meta-llama/llama-stack-apps #30

Error: Failed to initialize the TMA descriptor 801

Good day everyone, I am trying to run llama agentic system on RTX4090 with FP8 Quantization for the inference model and meta-llama/Llama-Guard-3-8B-INT8 for the Guard. WIth sufficiently small max_seq_…

anret updated 1 month ago

intel-analytics/ipex-llm #11914

ipex Llama.cpp server fails with Phi3 models

Hi, I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex. When I server with this version I have two problems: 1) The server doesn…

hvico updated 2 weeks ago

meta-llama/llama-stack-apps #73

ModuleNotFoundError: No module named 'llama_toolchain.memory…

llama stack run /home/guilherme/.llama/builds/local/conda/8b-instruct.yaml --port 5000 --disable-ipv6 Valor de args.config: /home/guilherme/.llama/builds/local/conda/8b-instruct.yaml > initializing …

JoseGuilherme1904 updated 1 day ago

meta-llama/llama-stack-apps #59

404 NOT Found!

after "llama distribution start --name ollama --port 5000 --disabled ipv6" then I get Serving POST /inference/batch_chat_completion Serving POST /inference/batch_completion Serving POST /inferenc…

chuznhiwu updated 6 days ago

janhq/jan #3508

feat: Jan supports most llama.cpp params

## Goal - Jan supports most llama.cpp params ## Tasklist **Cortex** - [x] https://github.com/janhq/cortex.cpp/issues/1151 **Jan** - [ ] Update Right Sidebar UX for Jan - [ ] Enable Jan's API serv…

imtuyethan updated 3 days ago

meta-llama/llama-stack #45

Inference Failed Because of '500 Internal Server Error'

After launching the distribution server by `"llama distribution start --name local-llama-8b --port 5000 --disable-ipv6 "`, running any inference example, for example `"python examples/scripts/vacatio…

dawenxi-007 updated 2 weeks ago

unslothai/unsloth #1012

Inferencing on CPU (using fine tuned version of llama 3.1)

I have fine tuned "meta-llama-3.1-8b-bnb-4bit" model using unsloth. I have downloaded the lora weights and able to do inferencing using those on Colab GPU. But i want use this fine tuned model for …

ApurvPujari updated 1 day ago

ollama/ollama #6756

Yet another "segmentation fault" issue with AMD GPU

### What is the issue? `Error: llama runner process has terminated: signal: segmentation fault (core dumped)`. It occurs while loading larger models, that are still within the VRAM capacity. Here I…

remon-nashid updated 3 hours ago

mudler/LocalAI #3367

Can't start LocalAI (with REBUILD) on Xeon X5570 - Unwanted …

**LocalAI version:** Using Docker image: `localai/localai:latest-aio-gpu-hipblas` **Environment, CPU architecture, OS, and Version:** - Ubuntu 22.04 - Xeon X5570 [Specs](https://ark.intel.c…

chris-hatton updated 3 weeks ago

1000+ results for llama-inference-server

1000+ results
for llama-inference-server