triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #266

Launch Triton server error occurred

My Gpu Config Tensorrt Engine Build Command python3 build.py --model_dir /opt/llms/llama-7b --dtype float16 --remove_i…

Burning-XX updated 11 months ago
7
triton-inference-server/server #7825

How to free multiple gpu memory

The question is how do you free memory https://github.com/triton-inference-server/onnxruntime_backend/issues/103 When the model is deployed to a single card, I can specify real-time release of…

1120475708 updated 1 week ago
1
triton-inference-server/server #7066

Model loaded via `model repository` api does not appear afte…

**Description** I've loaded a model via `v2/repository/models/simple/load` endpoint. But when querying `v2/repository/index` endpoint I get a `[]` as a responce. **Triton Information** What ver…

ogvalt updated 1 month ago
20
triton-inference-server/server #7672

Histogram Metric for multi-instance tail latency aggregation

**Is your feature request related to a problem? Please describe.** This issue is similar to the one mentioned here: https://github.com/triton-inference-server/server/issues/7287. I'd like to file an …

AshwinAmbal updated 2 months ago
1
triton-inference-server/server #7795

Triton server receives Signal (11) when tracing is enabled w…

**Description** When starting Triton Server with tracing and with a generic model (e.g., `identity_model_fp32` from the Python backend example), the server crashes with signal 11 after handling a f…

nicomeg-pr updated 1 week ago
5
NERC-CEH/plankton_ml #52

FastAPI model serving

We discussed interacting with image models (both for predictions and for embeddings) via an API rather than directly from python * Simplify adding pipeline stages (and replacing pipeline frameworks…

metazool updated 1 week ago
1
pytorch/TensorRT #3248

🐛 [Bug] Error when serving Torch-TensorRT JIT model to Nvidi…

## Bug Description I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_t…

zmy1116 updated 1 week ago
6
InternLM/lmdeploy #2789

[Bug] Does PytorchEngine Visual Model Support Prefix Caching…

### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version. - [ ] 3. Please note that if the bug-related iss…

OftenDream updated 1 week ago
1
triton-inference-server/server #7601

High GPU memory when load model use transformers

**Description** If I loaded 2 model transformer and inference model, memory GPU used about 3Gi. ``` PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command 2207044 coreai 0 C…

TheNha updated 2 months ago
2
facebookresearch/seamless_communication #460

Deployment of Seamless M4T Model - Exporting text.decoder to…

#### Description I am currently working on deploying the Seamless M4T model for text-to-text translation on a Triton server. I have successfully exported the `text.encoder` to ONNX and traced it …

HesamAlavian updated 20 hours ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for triton-server

1000+ results
for triton-server