llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

anming81/image-mirror #3

intelanalytics/ipex-llm-serving-cpu:latest

intelanalytics/ipex-llm-serving-cpu:latest

anming81 updated 2 months ago
1
argilla-io/distilabel #933

Unable to use OPENAILLM class with docker hosted vllm

Wrote the according to the following example at https://distilabel.argilla.io/latest/sections/how_to_guides/advanced/serving_an_llm_for_reuse/#serving-llms-using-vllm: ``` from distilabel.llms im…

akshaypn updated 1 month ago
1
substratusai/kubeai #192

Consider adding TensorRT engine

Opening issue to collect information on whether there is a good reason to add TensorRT as a serving backend. https://github.com/NVIDIA/TensorRT-LLM/issues/334

nstogner updated 3 weeks ago
1
kubernetes-sigs/wg-serving #14

[Serving Catalog] Add HPA configurations

One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not suff…

raywainman updated 1 week ago
2
bigscience-workshop/petals #614

Performance improving chances in the future

Hi there, I've been following this work for a few months and found it's really an amazing idea to run LLMs over the Internet, while I'm also trying to improve Petals' performance on model inference in…

oldcpple updated 3 days ago
1
explodinggradients/ragas #1188

Integrating third-party LLMs for Evaluating Chinese-native R…

Hi there, Thank you for bringing the elegant RAG Assessment framework to the community. I am an AI engineer from Alibaba Cloud, and our team has been fine-tuning LLM-as-a-Judge models based on t…

hurenjun updated 3 weeks ago
8
deepjavalibrary/djl #3491

Expose vLLM logprobs in model output

## Description vLLM sampling parameters include a [richer set of values](https://github.com/vllm-project/vllm/blob/c9b45adeeb0e5b2f597d1687e0b8f24167602395/vllm/sampling_params.py), among which `lo…

CoolFish88 updated 4 hours ago
3
mlc-ai/tokenizers-cpp #40

does it support multi - thread decode ?

I meet coredump when decoding with multi-thread. It cored in rust function `tokenizers_decode`，rust/src/lib.rs:199. here is the core backtrack. why does it do not support multi-thread? I think dec…

Vincent-syr updated 1 month ago
1
AgentOps-AI/agentops #383

LLM call Latency increased due to agentops

## 🐛 Bug Report **🔎 Describe the Bug** Give a clear and concise description of the bug. I have a fastapi uvicorn server which serves multiple concurrent requests. In each of the call, I am using …

rupav updated 1 week ago
2
SNU-ARC/any-precision-llm #7

No real speedup from any-precision-llm kernels

Hello, Similarly to #3, I've tried reproducing the `demo.py` benchmark on an H100 and an A6000 and I'm also seeing no speedup on these platforms at lower precisions. It was mentioned this is du…

pgimenes updated 4 days ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving