inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2014

failed to use TensorRT-LLM/examples/apps/fastapi_server.py

run inference with /TensorRT-LLM/examples/run.py , it's ok mpirun -n 4 -allow-run-as-root python3 /load/trt_llm/TensorRT-LLM/examples/run.py \ --input_text "hello，who are you?" \ …

AGI-player updated 2 weeks ago
10
kaito-project/kaito #608

Support more llm runtime

**Is your feature request related to a problem? Please describe.** **Describe the solution you'd like** Today KAITO supports the popular huggingface runtime. We should support other runtime like…

zhuangqh updated 3 weeks ago
1
ollama/ollama #7669

Only CPU is used after rebooting

[I found someone wrote a thread describing only cpu is used after rebooting in windows ](https://github.com/ollama/ollama/issues/4984#issue-2347076913) I also had similar problems even in Ubuntu OS.…

3DAlgoLab updated 2 hours ago
8
aws/amazon-sagemaker-examples #4651

Torch not compiled with CUDA enabled when deploying T5 using…

**Link to the notebook** https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/single-model/t5_pytorch_python-backend/t5_pytorch_python-backend.ipynb **Describe …

subhamiitk updated 4 weeks ago
1
ultralytics/ultralytics #16160

Make Enough FPS with Multiple Models for Multiple Stream

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

john09282922 updated 2 months ago
5
songquanpeng/one-api #1215

如何调用Triton Inference Server的接口？

**例行检查** [//]: # (方框内删除已有的空格，填 x 号) + [] 我已确认目前没有类似 issue + [] 我已确认我已升级到最新版本 + [] 我已完整查看过项目 README，已确定现有版本无法满足需求 + [] 我理解并愿意跟进此 issue，协助测试和提供反馈 + [] 我理解并认可上述内容，并理解项目维护者精力有限，**不遵循规则的 issue 可能会被…

realcarlos updated 7 months ago
1
meta-llama/llama-stack #328

Guardrail Loading Failed with Unexpected Large GPU Memory Re…

### System Info Python version: 3.10.12 Pytorch version: llama_models version: 0.0.42 llama_stack version: 0.0.42 llama_stack_client version: 0.0.41 Hardware: 4xA100 (40GB VRAM/GPU) local-…

dawenxi-007 updated 1 week ago
7
mbzuai-oryx/LLaVA-pp #29

Finetuning with lora output never ends.

Hi, Thanks for your wonderful work. I am struggling using my lora tuned model. I conducted following steps 1. finetuning with lora - Undi95/Meta-Llama-3-8B-Instruct-hf model base - llama3 …

gyupro updated 1 month ago
5
h2oai/h2ogpt #223

Update Triton inference server Docker deployment for Falcon …

https://github.com/h2oai/h2ogpt/blob/main/docs/TRITON.md do same for Falcon 7B, then Falcon 40B

arnocandel updated 1 year ago
4
elastic/elasticsearch #115166

Add mixed mode BWC testing support for test-specific plugins

This issues been filed to examine how best to support the `inference-service-test` plugin in ES|QL mixed version testing. The ES|QL CSV and REST tests run with a variety of modes (see `x-pack/plugin/…

ChrisHegarty updated 1 week ago
5

上一页 1...31 32 33 34 35 36 37...100 下一页

1000+ results for inference-server

1000+ results
for inference-server