inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

louisgv/local.ai #107

Notes

CUDA supports: https://github.com/kimlimjustin/xplorer/blob/master/src/Service/app.ts https://github.com/launchbadge/sqlx https://github.com/Jimver/cuda-toolkit https://github.com/LLukas22/llm-r…

louisgv updated 1 year ago
8
stitionai/devika #500

[ISSUE] Devika the does not search in Google

### Describe your issue Devika does not search on Google. all apis are registered ### How To Reproduce 1. Tsk: write the code for a simple telegram bot in Python 2. Wrote a plan 3. Invalid resp…

Sinicyn78 updated 5 months ago
1
marvik-ai/triton-llama2-adapter #1

Where can I get the llamav2.onnx model file from?

i'm running the python backend of the triton inference server. The server and client is running. However, the server cannot find the llamav2 model. ``` I1206 19:08:51.768841 100 http_server.cc:1…

ctmackay updated 8 months ago
1
triton-inference-server/onnxruntime_backend #111

Batch Support Error Triton ONNX Backend

**Description** Hello, I have an ONNX model. I am sharing the input and output dimensions of this model below. ![image](https://user-images.githubusercontent.com/81593133/161698185-65e50766-2697-…

sarperkilic updated 2 years ago
7
triton-inference-server/server #5238

ehancement(client): Python type-hints

It would be nice if the client for triton-inference-server support type hints. A nice addition is to include generated type hints for protobuf stubs for `model_config.proto` and `grpc_service.proto…

aarnphm updated 1 year ago
2
triton-inference-server/tensorrtllm_backend #429

`max_batch_size` seems to have no impact on model performanc…

### System Info - CPU architecture: x86_64 - GPU: 1 x Nvidia A100 - Docker image for LLM serialization: nvidia/cuda:12.1.0-devel-ubuntu22.04 - Docker image for triton server launch: nvcr.io/nvid…

VitalyPetrov updated 4 months ago
8
ros-ai/ros2_whisper #12

Add support for continuous listening

Instead of pressing a key, continuously listen till the wake work is announced (i.e., "Hey Ross")

RoboEvangelist updated 3 months ago
1
LAION-AI/Open-Assistant #2815

Allow cancellation of prediction while running prompt

Open Assistant is great, but sometimes it will predict a long answer where I can spot a misinterpretation right away. Whether this is because my prompt was faulty and I realise this too late, or the m…

daanrongen updated 4 months ago
13
LlmKira/VitsServer #10

ONNX converting issues

Hello, I faced these errors while converting to onnx TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so…

NikitaKononov updated 1 year ago
12
triton-inference-server/tensorrtllm_backend #367

sreaming mode doesn't work

### System Info V100*2 nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 tensorrt-llm 0.7.0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own mo…

dongteng updated 4 months ago
2

上一页 1...85 86 87 88 89 90 91...100 下一页

1000+ results for inference-server

1000+ results
for inference-server