-
### Describe the issue
according to the [Local-LLMs/](https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/), the autogen can support multiple local llm.
my command for fastchat
First,…
-
Good day everyone, I am trying to run llama agentic system on RTX4090 with FP8 Quantization for the inference model and meta-llama/Llama-Guard-3-8B-INT8 for the Guard. WIth sufficiently small max_seq_…
anret updated
1 month ago
-
Hi,
I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex.
When I server with this version I have two problems:
1) The server doesn…
hvico updated
2 weeks ago
-
llama stack run /home/guilherme/.llama/builds/local/conda/8b-instruct.yaml --port 5000 --disable-ipv6
Valor de args.config: /home/guilherme/.llama/builds/local/conda/8b-instruct.yaml
> initializing …
-
after "llama distribution start --name ollama --port 5000 --disabled ipv6"
then I get
Serving POST /inference/batch_chat_completion
Serving POST /inference/batch_completion
Serving POST /inferenc…
-
## Goal
- Jan supports most llama.cpp params
## Tasklist
**Cortex**
- [x] https://github.com/janhq/cortex.cpp/issues/1151
**Jan**
- [ ] Update Right Sidebar UX for Jan
- [ ] Enable Jan's API serv…
-
After launching the distribution server by `"llama distribution start --name local-llama-8b --port 5000 --disable-ipv6
"`, running any inference example, for example `"python examples/scripts/vacatio…
-
I have fine tuned "meta-llama-3.1-8b-bnb-4bit" model using unsloth. I have downloaded the lora weights and able to do inferencing using those on Colab GPU.
But i want use this fine tuned model for …
-
### What is the issue?
`Error: llama runner process has terminated: signal: segmentation fault (core dumped)`. It occurs while loading larger models, that are still within the VRAM capacity. Here I…
-
**LocalAI version:**
Using Docker image:
`localai/localai:latest-aio-gpu-hipblas`
**Environment, CPU architecture, OS, and Version:**
- Ubuntu 22.04
- Xeon X5570 [Specs](https://ark.intel.c…