-
Hello !
XGBoost recently enabled developers to use categorical features in its models (Nvidia did an article on that : https://developer.nvidia.com/blog/categorical-features-in-xgboost-without-man…
-
is it possible to limit "max_memory" while serving the model ?
-
/kind bug
**What steps did you take and what happened:**
Deployed sklearn InferenceService with enabled metrics aggregation. Used qpext image in queue-proxy sidecar as described in docs.
Deployme…
-
Hi there,
Great framework for serving mxnet based models!
I wonder if there is any suggestions on supporting multi-stage models such as MTCNN, https://github.com/pangyupo/mxnet_mtcnn_fa…
-
I couldn't benchmark my model, seems the benchmark send requests without wait for the response, so the following error is raised:
```
python benchmark_serving.py \
--backend vllm \…
-
servableHandle = {ServableHandle@9593} "ServableHandle(UntypedServableHandle({name: lr, version: 6},com.tencent.angel.serving.core.SimpleLoader@7849b420))"
untypedHandle = {UntypedServableHandle@961…
-
**Setup**
Machine: AWS Sagemaker ml.p4d.24xlarge
Model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Used Docker container image with the latest build of trt-llm (`0.8.0.dev2024011…
-
### Description
Hello All,
I have trained a T2T model and exported model, then used tensorflow-serving and nvidia docker to serving my model. It works fine on one GPU scenario: load a model on GPU…
-
Check AttentionStore paper and see if the performance would be good or not.
- AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving https://…
-
I am new to using torchserve for model deployment. I have tried using torchserve to serve multiple models e.g. `model1.mar` and `model2.mar` simultaneously as following:
`torchserve --start --ncs -…