llama-inference-server Search Results

1000+ results
for llama-inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

codelion/optillm #8

use with llama.cpp

I'm trying to understand if this could be used with a local llm via llama.cpp in interactive mode. Is this possible? Would very much like to try this out.

scalar27 updated 2 hours ago
7
HabanaAI/vllm-fork #220

[Bug]: Using tensor parallel during offline inference causes…

### Your current environment docker: vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest branch: habana_main ### 🐛 Describe the bug I attempted to use the off…

xinsu626 updated 1 day ago
1
mbzuai-oryx/LLaVA-pp #29

Finetuning with lora output never ends.

Hi, Thanks for your wonderful work. I am struggling using my lora tuned model. I conducted following steps 1. finetuning with lora - Undi95/Meta-Llama-3-8B-Instruct-hf model base - llama3 …

gyupro updated 2 months ago
4
bdambrosio/AllTheWorldAPlay #5

llama.cpp server support

Will you consider supporting the llama.cpp server API for inference?

lastrosade updated 3 months ago
6
NVIDIA/TensorRT-LLM #2005

[Bug] Mistral Nemo 12B smoothquant convert error

### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: https://github.com/NVIDIA/TensorRT-LLM.git (ab49b93718b906030bcec0c817b10ebb373d4179) https://github.com/triton-…

fan-niu updated 1 day ago
3
explodinggradients/ragas #955

[R-254] Issue in Evaluation using local LLM

[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question. **Your Question** > “WARNING:ragas.llms.output_parser:Failed to parse …

sheetalkamthe55 updated 3 months ago
2
flexflow/FlexFlow #1454

Questions about the measurement of the latency

Hello, FlexFlow team! Thank you for your outstanding work! I am attempting to reproduce the experimental results from the paper "SpecInfer: Accelerating Generative Large Language Model Serving with…

QAZWSX0827 updated 1 month ago
2
ggerganov/llama.cpp #8911

Bug: Inference fails with "llama_get_logits_ith: invalid log…

### What happened? Inference fails with this cryptic error. This happens with both CPU and Vulkan engines. What might be causing this? ### Name and Version llama-cpp-3538 ollama-0.3.4 ### W…

yurivict updated 1 week ago
1
EricLBuehler/mistral.rs #617

Tensor parallel support for multi GPU

Hello, I'm not sure if multi GPU is supported yet. I didn't find parameters for tensor parallel, and the "num_device_layers" parameter seems not work. Please let me know if it supports or has plans to…

ilookee updated 1 month ago
10
ggerganov/llama.cpp #8832

Bug: RPC inference is drastically slower even on localhost

### What happened? I am trying to run inference on RPC example. When running the llama-cli with rpc feature over a single rpc-server on localhost, the inference throughput is only 1.9 tok/sec for lla…

hafezmg48 updated 1 week ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llama-inference-server

1000+ results
for llama-inference-server