inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #87

Feature request: Flag indicate end of stream

Hi, during request streaming it'll be helpful to have a flag to indicate end of generation. Can you help with this feature request? I believe that means returning the bool flag from https://github.…

yunfeng-scale updated 7 months ago
2
triton-inference-server/server #4485

SHARK Backend integration

[SHARK](https://github.com/nod-ai/SHARK) is a high performance codegen compiler and runtime built on MLIR, IREE and custom RL based tuning infrastructure. [Here](https://nod.ai/shark-the-fastest-runti…

powderluv updated 2 years ago
1
triton-inference-server/server #5467

Configurable rate-limiting / queue policy for sequence batch…

**Is your feature request related to a problem? Please describe.** As documented [here](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.htm…

aw1cks updated 8 months ago
3
microsoft/onnxruntime #16185

no acceleration onnx on e5 2680v3

onnx version :'1.14.0' When I convert the weight file to .onnx (half=True) When using cpu for inference at that time Inference speed is 1.5 times faster than .pt on my own computer (i7 12700) Pr…

xiaoguaishoubaobao updated 1 year ago
3
lastmile-ai/aiconfig #1502

[AIConfig 2.0] Chat History

## Problem AIConfig currently unnecessarily couples conversation / multi-turn chat history with the config itself. It uses 'remember_chat_context' and [extracts the conversation history from previous…

rholinshead updated 1 month ago
1
netease-youdao/QAnything #75

Triton服务启动超时，在models里面有个日志文件QAEnsemble.log

最后的日志显示： qanything-container-local | Triton服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | The triton service is starting up, it can be long... you have time to make a coffee :) qanyth…

Wimet7 updated 7 months ago
8
kserve/kserve #2620

Kserve cloudevent asynchronous inference

/kind feature **Describe the solution you'd like** We use Kserve alongside with Kserve eventing to trigger an inference, we listen for an `io.kserve.inference.response` event to continue our wor…

amirzahavi updated 1 year ago
1
dmis-lab/bern #27

Batch Processing for BERN server

The `server.py` does not allow multiple text inputs to be sent. Will this capability be introduced ? Is the underlying batch capability of the models being utilised while inference ?

geekyogurt updated 4 years ago
1
thanhlnbka/yolov7-triton-deepstream #1

TRT:EfficientNMS_TRT(-1) is not a registered function/op

I am trying to run yolov7 on triton (not the entire deepstream). I have converted .pt -> .onnx -> .trt in yolov7. All these files work successfully during inference. But when I am trying to deploy wei…

gigasurgeon updated 1 year ago
1
triton-inference-server/server #7406

Version specific config.pbtxt

We would like to be able to deploy multiple versions of the same model. Unfortunately, they will not necessarily always have the same shapes and dtypes. It would be great to have a per version con…

lminer updated 2 months ago
2

上一页 1...79 80 81 82 83 84 85...100 下一页

1000+ results for inference-server

1000+ results
for inference-server