-
run inference with /TensorRT-LLM/examples/run.py , it's ok
mpirun -n 4 -allow-run-as-root python3 /load/trt_llm/TensorRT-LLM/examples/run.py \
--input_text "hello,who are you?" \
…
-
**Is your feature request related to a problem? Please describe.**
**Describe the solution you'd like**
Today KAITO supports the popular huggingface runtime. We should support other runtime like…
-
[I found someone wrote a thread describing only cpu is used after rebooting in windows ](https://github.com/ollama/ollama/issues/4984#issue-2347076913)
I also had similar problems even in Ubuntu OS.…
-
**Link to the notebook**
https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/single-model/t5_pytorch_python-backend/t5_pytorch_python-backend.ipynb
**Describe …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
**例行检查**
[//]: # (方框内删除已有的空格,填 x 号)
+ [] 我已确认目前没有类似 issue
+ [] 我已确认我已升级到最新版本
+ [] 我已完整查看过项目 README,已确定现有版本无法满足需求
+ [] 我理解并愿意跟进此 issue,协助测试和提供反馈
+ [] 我理解并认可上述内容,并理解项目维护者精力有限,**不遵循规则的 issue 可能会被…
-
### System Info
Python version: 3.10.12
Pytorch version:
llama_models version: 0.0.42
llama_stack version: 0.0.42
llama_stack_client version: 0.0.41
Hardware: 4xA100 (40GB VRAM/GPU)
local-…
-
Hi, Thanks for your wonderful work.
I am struggling using my lora tuned model.
I conducted following steps
1. finetuning with lora
- Undi95/Meta-Llama-3-8B-Instruct-hf model base
- llama3 …
-
https://github.com/h2oai/h2ogpt/blob/main/docs/TRITON.md
do same for Falcon 7B, then Falcon 40B
-
This issues been filed to examine how best to support the `inference-service-test` plugin in ES|QL mixed version testing.
The ES|QL CSV and REST tests run with a variety of modes (see `x-pack/plugin/…