model-serving Search Results

1000+ results
for model-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/fil_backend #389

[FEA] Support categorical features when serving XGBoost mode…

Hello ! XGBoost recently enabled developers to use categorical features in its models (Nvidia did an article on that : https://developer.nvidia.com/blog/categorical-features-in-xgboost-without-man…

gfalcone updated 5 months ago
2
microsoft/DeepSpeed-MII #453

Limit VRAM usage in serving the model

is it possible to limit "max_memory" while serving the model ?

risedangel updated 7 months ago
2
kserve/kserve #3864

Serverless. Auto-generated Service for Deployment with enabl…

/kind bug **What steps did you take and what happened:** Deployed sklearn InferenceService with enabled metrics aggregation. Used qpext image in queue-proxy sidecar as described in docs. Deployme…

dkondidatov updated 1 month ago
4
awslabs/multi-model-server #238

Supporting MTCNN style model serving?

Hi there, Great framework for serving mxnet based models! I wonder if there is any suggestions on supporting multi-stage models such as MTCNN, https://github.com/pangyupo/mxnet_mtcnn_fa…

windsorwho updated 6 years ago
2
vllm-project/vllm #3230

Error in benchmark model with vllm backend

I couldn't benchmark my model, seems the benchmark send requests without wait for the response, so the following error is raised: ``` python benchmark_serving.py \ --backend vllm \…

hahmad2008 updated 2 weeks ago
13
Angel-ML/angel #733

serving one model multiple versions error

servableHandle = {ServableHandle@9593} "ServableHandle(UntypedServableHandle({name: lr, version: 6},com.tencent.angel.serving.core.SimpleLoader@7849b420))" untypedHandle = {UntypedServableHandle@961…

mattxia updated 5 years ago
1
NVIDIA/TensorRT-LLM #924

Mixtral-8x7b-instruct-0.1 build fails with TypeError: LoraCo…

**Setup** Machine: AWS Sagemaker ml.p4d.24xlarge Model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 Used Docker container image with the latest build of trt-llm (`0.8.0.dev2024011…

ajamjoom updated 1 day ago
4
tensorflow/tensor2tensor #1738

Use Multi_GPU for Model Serving

### Description Hello All, I have trained a T2T model and exported model, then used tensorflow-serving and nvidia docker to serving my model. It works fine on one GPU scenario: load a model on GPU…

EthannyDing updated 4 years ago
1
bd-iaas-us/vllm #30

[Feature]: Decouple positional encoding & persisted KV cache

Check AttentionStore paper and see if the performance would be good or not. - AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving https://…

JackChuang updated 2 weeks ago
5
pytorch/serve #1011

Simultaneously serving models with conflicting dependencies

I am new to using torchserve for model deployment. I have tried using torchserve to serve multiple models e.g. `model1.mar` and `model2.mar` simultaneously as following: `torchserve --start --ncs -…

urmeya updated 1 year ago
3

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for model-serving

1000+ results
for model-serving