-
1. I used this command for inference but encountered issue. Anyone knows how to fix this?
- command: `python launch.py --n_GPUs 1 main.py --batch_size 8 --precision single`
- error :
`[W socke…
-
### Motivation
你好。我看到文档中支持offline inference模式下,得到input logprob。请问api server部署方式下支持吗?如果不支持,请问近期会有plan吗?
### Related resources
#2041
### Additional context
_No response_
-
Notes from following the Scoring Server on AWS to set up a AMI Rest Server from an H2O-3 MOJO.
The documentation version of [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html?highl…
-
### System Info
tgi-gaudi docker container built from master branch (4fe871ffaaa62f1a203607078e868fcca962b017)
Ubuntu 22.04.3 LTS
Gaudi2
HL-SMI Version: hl-1.15.0-fw-48.2.1.1
Driver Version: 1…
-
Terraform apply fails.
When runnint terraform apply, it fails deploying Kubernetes.
Used Branch: release-1.1
Logs:
````
module.inference-server.kubernetes_deployment.inference_deploymen…
-
你好,我先进行chatchat init,然后运行chatchat kb -r 出现以下错误:
2024-07-25 18:06:36.050 | INFO | chatchat.server.knowledge_base.kb_cache.faiss_cache:load_vector_store:109 - loading vector store in 'samples/vec…
-
**Description**
Im using a simple client inference class base on client example. My tensorRT inference with batchsize 10 with 150ms and my triton with tensorRT backend took 1100ms. This is my client:…
-
Hi team, QQ: does `lightseq` support the followings,
- Convert HuggingFace BERT/RoBERTa models to `int8` precision directly
- If yes, can the converted model be exported to ONNX format directly?
- …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.
Amazing work with the library btw, love it!