inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

gluestack/gluestack-ui #1704

Typescript autocompletion broken in Webstorm

### Description Typescript autocompletionand types resolutiuon is not working in Webstorm on ejected theme ### CodeSandbox/Snack link _No response_ ### Steps to reproduce See new comments in this…

vaniyokk updated 1 day ago
9
triton-inference-server/server #7382

Building from source fails with tensorrt_llm backend

**Description** While building from source, the build fails when tensorrt_llm backend is chosen. **Triton Information** What version of Triton are you using? r24.04 Are you using the Triton co…

arya-samsung updated 2 months ago
7
triton-inference-server/fastertransformer_backend #162

Can i stop execution? (w/ `decoupled mode`)

### Description ```shell Docker: nvcr.io/nvidia/tritonserver:23.04-py3 Gpu: A100 How can i stop bi-direction streaming(decoupled mode)? - I want to stop model inference(streaming response) when …

Yeom updated 1 year ago
1
vllm-project/vllm #6060

[Bug]: Garbled Tokens appears in vllm generation result ever…

### Your current environment ```text tiktoken==0.6.0 transformers==4.38.1 tokenizers==0.15.2 vLLM Version: 0.4.3 fastchat Version: 0.2.36 ``` ### 🐛 Describe the bug Currently, I'm using fa…

Jason-csc updated 1 month ago
1
NVIDIA/k8s-device-plugin #430

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. pree…

``` root@ttogpu:~# kubectl describe pod triton-inference-server-5b6c7f889c-f54c6 Name: triton-inference-server-5b6c7f889c-f54c6 Namespace: default Priority: 0 Service …

Todoroki02 updated 7 months ago
1
NVIDIA/TensorRT #4105

performance of concurrent with different module.

## Description I have two different module and convert to trt. when I run them in Serial. the cost time of only infer: ``` //10 times do_infer >> cost 400.60 msec. //warn-up do_infer >> cost 42.22 …

LightSun updated 2 weeks ago
7
sgl-project/sglang #1487

Development Roadmap (2024 Q4)

Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome ([**Join Bi-weekly Development Meeting**](https://t.co/4BFjCLnVHq)). Previous 2024 Q3 roadmap can be found in #634. …

Ying1123 updated 3 days ago
1
NJU-Jet/SR_Mobile_Quantization #9

it seems the inference is very slow on my linux server?

Hi, Dear NJU-Jet my linux server: several 2.6GHz CPU + several V100, and I run the **generate_tflite.py** to got a quantized model. and then in function **evaluate**, I add below code to measu…

xiaoxiongli updated 2 years ago
5
triton-inference-server/server #7279

Automatically unload (oldest) models when memory is full

**Is your feature request related to a problem? Please describe.** I am asking the recommended way to achieve the following behavior. SCENARIO: I have many different models. Consider them differen…

elmuz updated 4 months ago
2
microsoft/MInference #40

[Question]: How does VLLM use MInference through OpenAI Comp…

### Describe the issue Can I run "python -m vllm.entrypoints.openai.api_server" to load MInference capabilities in VLLM?

jueming0312 updated 2 months ago
2

上一页 1...33 34 35 36 37 38 39...100 下一页

1000+ results for inference-server

1000+ results
for inference-server