inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

sherlock-audit/2024-06-allora-judging #14

volodya - Not appropriate Inferences will be used when calcu…

volodya High # Not appropriate Inferences will be used when calculating the forecast ## Summary Not appropriate Inferences will be used when calculating the forecast due to not saving filtered resu…

sherlock-admin3 updated 2 months ago
2
triton-inference-server/server #7079

How to access the triton inference server from outside of th…

Since the ingressroutes(https://github.com/triton-inference-server/server/blob/main/deploy/k8s-onprem/templates/ingressroute.yaml) has been deployed as LB to balance requests across all triton pods. H…

zengqingfu1442 updated 7 months ago
1
flexflow/FlexFlow #995

How are the benchmarks measured?

I am attempting to use FlexFlow to compare the inference speed to vLLM, but FlexFlow appears to be an order of magnitude slower than vLLM and I've been running into many errors. Testing on a Linux ser…

eugenepentland updated 3 months ago
4
NVIDIA/TensorRT-LLM #1111

[Feature Request] Support for Constrained Decoding (such as …

Summary I would like to propose the addition of constrained decoding support. This feature would allow the output sequence to be constrained by a Finite State Machine (FSM) or Context-Free Grammar (C…

silverriver updated 3 days ago
8
saif-ellafi/play-by-the-writing #14

Support configuring base_url

It would be nice if we could configure the base url, then people could use offline models via [ollama](https://ollama.com/) or similar tools.

hardliner66 updated 2 months ago
1
huggingface/chat-ui #657

Error: Server does not support event stream content type, it…

I want to deploy a few open source models with the chat UI. I started a simple model with: ``` model=tiiuae/falcon-7b-instruct volume=$PWD/data # share a volume with the Docker container to avoid…

scchess updated 8 months ago
1
stefanodellosa-personal/WADAS #36

OpenVino performances

Hi, I am a member of the DeepFaune and I saw that you are using our model and that you converted it to OpenVino. Do you have some values of the speed-up it offers ?

Gasp34 updated 2 weeks ago
7
predibase/lorax #346

Expose env to set base path for local adpaters

### Feature request/question Expose ENV/flag in `lorax-server` and `lorax-launcher` to set base path of adapter during inference. We currently tried to do a workaround by setting HUGGINGFACE_HUB_C…

bjornjee updated 8 months ago
1
fmperf-project/fmperf #5

Add support for server deployment on MIG

Current `Cluster` deployment only allows inference servers to be deployed on GPU [see here](https://github.com/fmperf-project/fmperf/blob/b7ae68125724d3c63563fd84eebba7eee347e27f/fmperf/Cluster.py#L13…

yuezhu1 updated 5 months ago
7
triton-inference-server/server #7035

RAM memory growth of triton server, until killed by OS

Im using nvcr.io/nvidia/tritonserver:23.10-py3 container for my inferencing, using C++ GRPC API. There is several models in container, Yolov8-like architecture in Tensorrt plus a few Torchscript model…

InfiniteLife updated 7 months ago
4

上一页 1...43 44 45 46 47 48 49...100 下一页

1000+ results for inference-server

1000+ results
for inference-server