llama-rs Search Results

EricLBuehler/mistral.rs #629

Pre-built binary for macOS Silicon does not seem to use Meta…

## Describe the bug Download https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.2/mistralrs-server-aarch64-apple-darwin.tar.xz Use a tool like [asitop](https://github.com/tlkh/asito…

ChristianWeyer updated 3 days ago

LlamaEdge/rag-api-server #18

Can't run glm-4-9b-chat on cuda 12

When I run glm-4-9b-chat-Q5_K_M.gguf on the Cuda 12 machine, the API server can be started successfully. However, when I send a question, the API server will crash. The command I used to start the …

alabulei1 updated 3 weeks ago

TabbyML/tabby #2634

llama-server with cpu device is not working in docker image

```yaml services: tabby: restart: always image: tabbyml/tabby entrypoint: /opt/tabby/bin/tabby-cpu command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct vol…

b-reich updated 2 days ago

WasmEdge/WasmEdge #3605

bug: Can not run llama-api-server with kuasar

### Summary I'm trying to run llama-api-server with Llama-3-8B in wasm-sandboxer of Kuasar! But got some errors. ### Current State _No response_ ### Expected State _No response_ ### …

Burning1020 updated 6 days ago

TabbyML/tabby #1666

Error in Tabby deployment - llama_cpp_bindings::llama: crate…

**Describe the bug** I'm noticing below error with our Tabby deployment, looks like a memory error. Don't have any additional logs, since we've modified the logs to mask input, output information, th…

mprudra updated 1 month ago

TabbyML/tabby #2654

Reuse llama-server for models supporting both chat / fim com…

**Please describe the feature you want** Related: https://github.com/TabbyML/tabby/issues/2652 This allows a local deployment could use fewer vram / computing in local setup **Additional co…

wsxiaoys updated 2 weeks ago

unslothai/unsloth #793

Error when deploying on HF inference endpoints

Hi there, First thank you for unsloth, it's great! I've finetuned a llama-3-8b-Instruct-bnb-4bit and pushed it to hf hub. When I try to deploy it using [hf Inference Endpoints](https://huggingfa…

adamrobertolo78 updated 1 week ago

edgenai/llama_cpp-rs #90

Building with latest version of llama.cpp

On adding llama_cpp-rs to my Cargo.toml, llama.cpp seems to be locked to an older version. I'm trying to use Phi-3 128k in a project and I'm unable to because the [PR that was merged into llama.cpp](h…

cooperll updated 2 months ago

huggingface/text-generation-inference #2207

Tgi crash on multi GPUs

### System Info I am trying to run TGI on Docker using 8 GPUs with 16GB each (In-house server) . Docker works fine with using single GPU. My server crashes when using all GPUs. is there any other wa…

RohanSohani30 updated 2 weeks ago

EricLBuehler/mistral.rs #617

Tensor parallel support for multi GPU

Hello, I'm not sure if multi GPU is supported yet. I didn't find parameters for tensor parallel, and the "num_device_layers" parameter seems not work. Please let me know if it supports or has plans to…

ilookee updated 6 days ago

734 results for llama-rs

734 results
for llama-rs