llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

databricks/databricks-sdk-py #777

[FEATURE] Allow Dictionary Inputs for Complex Types in SDK

**Problem Statement** The SDK currently requires users to create specific object types (like EndpointCoreConfigInput, AiGatewayConfig, RateLimit, EndpointTag) when e.g. creating a serving endpoint (s…

djliden updated 1 month ago
1
vllm-project/vllm #6790

[Bug]: Engine iteration timed out. This should never happen!

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.1+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS…

Kelcin2 updated 1 week ago
7
vllm-project/vllm #5687

[Bug]: Illegal memory access

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC ve…

w013nad updated 2 weeks ago
7
opea-project/GenAIComps #764

[Feature] Implement Health Check Endpoint for Delayed Servic…

OS type Ubuntu Description When running the example Translation using Docker Compose, one of the images takes additional time to pull a model from the Huggingface upon startup. During this period…

isaacncz updated 1 month ago
7
NVIDIA/TensorRT-Model-Optimizer #108

[RFC] TensorRT Model Optimizer - Product Roadmap

# TensorRT Model Optimizer - Product Roadmap [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) (ModelOpt)’s north star is to be the best-in-class model optimization toolki…

hchings updated 3 days ago
5
intel-analytics/ipex-llm #11803

Running benchmark/all-in-one with GLM-4-9B-Chat model report…

Please help to confirm if the GLM-4-9B-Chat is supported , thanks so much. Docker images：intelanalytics/ipex-llm-serving-vllm-xpu-experiment Tag：2.1.0b2 Image ID：0e20af44ad46 step: …

dukelee111 updated 3 months ago
1
deepjavalibrary/djl-serving #1816

DJL-TensorRT-LLM Bug: TypeError: Got unsupported ScalarType …

## Description (A clear and concise description of what the bug is.) I'm am building the DJL-Serving TensorRT-LLM LMI inference container from scratch, and deploying on Sagemaker Endpoints for Zep…

rileyhun updated 7 months ago
3
anarchy-ai/LLM-VM #375

Load-balancing / auto-scaling for LLM serving on Google Clou…

VictorOdede updated 8 months ago
2
deepjavalibrary/djl #3343

Model conversion process failed when deploying Mixtral 8x22B…

## Description Model conversion process failed with djl-tensorrtllm and below serving.properties: ``` image_uri = image_uris.retrieve( framework="djl-tensorrtllm", region=sess…

gsjoy8888 updated 4 months ago
3
NVIDIA/nim-deploy #64

Error to deploy `llama-3.1-8b-instruct:1.1.1` using download…

I tried to deploy `llama-3.1-8b-instruct:1.1.1` with Kserve and [modelcar](https://kserve.github.io/website/latest/modelserving/storage/oci/) on Openshift AI. **What I have done?** 1. [Downloaded…

xieshenzh updated 2 months ago
2

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving