triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #6990

Unable to load/unload models through SageMaker + Triton

**Description** sagemaker_server.cc exposes load/unload-ing models through an http post request to SageMaker. I'm unable to load or unload models through SageMaker for Triton. I'm currently testing l…

jadhosn updated 5 months ago
4
triton-inference-server/tensorrtllm_backend #334

Is the TRT-LLM backend supported on MIG-enabled node pool?

TRT-LLM version: **0.5.0** Triton server version: **23.10** GPU type: A100, 80GB, with MIG enabled (20gb GPU memory per split, 3 splits per node). I am trying to run a Falcon-7B model with TRT-LL…

kelkarn updated 2 months ago
4
fauxpilot/fauxpilot #162

I get an error at ./launch it starts the triton server and t…

I have nvidia and nvidia-docker installed. I have a 1060 with 6GB vram it has compute 6.0 how can I troubleshoot this? when I run other nvidia containers on my PC I have to use --privileged to get …

billyblackburn updated 1 year ago
2
NVIDIA-AI-IOT/deepstream_triton_model_deploy #16

Jetpack 4.5 triton 2.10 - deepstream 5.1 on Jetson Xavier NX

triton 2.10 supports onnx but still got error when loading model. Release link: [(https://github.com/triton-inference-server/server/releases)] Input or output layers are empty ![image](https://…

htran170642 updated 3 years ago
4
h2oai/h2ogpt #87

NVIDIA Triton inference support

https://github.com/triton-inference-server/ - [x] Build Triton Docker image with support for FasterTransformer backend for Fusion etc. - [x] convert h2oGPT models to format that Triton understands h…

arnocandel updated 1 year ago
1
triton-inference-server/server #6730

Python backend doesn't load model.py in the model directory …

**Description** The Python backend does not properly load the `model.py` file in the model directory when trailing slashes (`/`) are present in the `--backend-directory` option. **Triton Informa…

twjang updated 8 months ago
1
triton-inference-server/tensorrtllm_backend #118

Feature Request: Llama-2 on Triton Inference Server with Ten…

Hi, I am able to reproduce building and running the model locally via TensorRT-LLM. I build using: ``` python3 build.py --model_dir /finetune-gpt-neox/models--meta-llama--Llama-2-7b-hf/snapsho…

rtalaricw updated 10 months ago
3
facebookresearch/xformers #1102

no module 'xformers'. Processing without...

# 🐛 Bug ``` C:\Users\ZeroCool22\Desktop\SwarmUI\dlbackend\comfy>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build [START] Security scan [DONE] Security scan ## ComfyUI-M…

ZeroCool22 updated 5 days ago
1
triton-inference-server/server #7180

ORT-TRT backend uses too much CPU memory

**Description** When using ORT-TRT backend on GPU, the CPU memory usage is as high as the usage when we use CPU inference. **Triton Information** What version of Triton are you using? 2.45.0 …

ShuaiShao93 updated 4 months ago
1
triton-inference-server/server #7477

Exllamav2 inference with EXL Quants

Do you support Exllamav2 backend for the inference that supports exl quants? The current alternative is vllm but that doesn't support EXL quants. Also, after running a perplexity test, EXL is the b…

rjmehta1993 updated 1 month ago
1

上一页 1...25 26 27 28 29 30 31...100 下一页

1000+ results for triton-server

1000+ results
for triton-server