-
**Problem Statement**
The SDK currently requires users to create specific object types (like EndpointCoreConfigInput, AiGatewayConfig, RateLimit, EndpointTag) when e.g. creating a serving endpoint (s…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC ve…
-
OS type
Ubuntu
Description
When running the example Translation using Docker Compose, one of the images takes additional time to pull a model from the Huggingface upon startup. During this period…
-
# TensorRT Model Optimizer - Product Roadmap
[TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) (ModelOpt)’s north star is to be the best-in-class model optimization toolki…
-
Please help to confirm if the GLM-4-9B-Chat is supported , thanks so much.
Docker images:intelanalytics/ipex-llm-serving-vllm-xpu-experiment
Tag:2.1.0b2
Image ID:0e20af44ad46
step:
…
-
## Description
(A clear and concise description of what the bug is.)
I'm am building the DJL-Serving TensorRT-LLM LMI inference container from scratch, and deploying on Sagemaker Endpoints for Zep…
-
-
## Description
Model conversion process failed with djl-tensorrtllm and below serving.properties:
```
image_uri = image_uris.retrieve(
framework="djl-tensorrtllm",
region=sess…
-
I tried to deploy `llama-3.1-8b-instruct:1.1.1` with Kserve and [modelcar](https://kserve.github.io/website/latest/modelserving/storage/oci/) on Openshift AI.
**What I have done?**
1. [Downloaded…