inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

meta-llama/llama-stack-apps #22

FP8 quantizatin examples

When will FP8 quantizatin examples be released? Thanks!

zitgit updated 2 months ago
1
huggingface/text-embeddings-inference #6

Expose Optimized Transformers Inference for ETL

### Feature request I'd like to use this library for really high throughout ETLs along as an inference server. How I imagine this working is exposing some sort of object which can operate on in-mem…

sam-h-bean updated 9 months ago
1
huggingface/huggingface-llama-recipes #43

Call for contributions

# 🎉 Open Call for Contributions to the LLaMA Recipes Repository Hey there! 👋 We are excited to open up our repository for open-source contributions and can't wait to see what recipes you come up…

ariG23498 updated 7 hours ago
17
triton-inference-server/server #7664

When there are multiple GPU, only one GPU is used

**Description** When there are multiple GPU, only one GPU is used. **Triton Information** Container: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3 **To Reproduce** Follow the instrcutio…

gyr66 updated 20 hours ago
3
triton-inference-server/server #6186

Documentation for classification extension doesn't explain l…

**Is your feature request related to a problem? Please describe.** I cannot find any documentation on configuring a model to produce text labels when requested. I found another issue (https://github.…

david-waterworth updated 1 year ago
1
triton-inference-server/server #4870

Python backend cannot import Tensor

**Description** Python backend model import Tensor `from triton_python_backend_utils import Tensor` Got error: UNAVAILABLE: Internal: ImportError: cannot import name 'Tensor' from 'triton_python_…

Phelan164 updated 7 months ago
6
SthPhoenix/InsightFace-REST #77

Triton Model Server with Mxnet Models

Were you able to run mxnet models with Triton Inference Server?

zeynepkoyun updated 2 years ago
1
vllm-project/vllm #5537

[Bug]: CUDA illegal memory access error when `enable_prefix_…

### Your current environment ```text The output of `python collect_env.py` PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

mpoemsl updated 1 week ago
17
abetlen/llama-cpp-python #1062

Concurrent request handling

Hey there!! 🙏 I am currently working on a project that involves the sending request to the model using flask api and when user sends the request concurrently the model is not able to handle it. Is …

khanjandharaiya updated 3 months ago
6
triton-inference-server/server #7636

UNAVAILABLE: Not found: unable to load shared library: %1 is…

Hey everyone, this is really urgent, I am trying to build Triton with ONNX backend from source on windows, but I am encountering an error (see below) and I don't know what I am doing wrong, please hel…

mhbassel updated 1 day ago
4

上一页 1...55 56 57 58 59 60 61...100 下一页

1000+ results for inference-server

1000+ results
for inference-server