-
**Description**
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model usi…
-
**Description**
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model usi…
-
**LocalAI version:**
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
**Environment, CPU architecture, OS, and Version:**
INTEL7 - 12 cores, NVIDIA GTX 1060, 30GB RAM
services:
api:
…
-
### Description
I am working with ModelMesh Serving deployed on a Kubernetes cluster and I am looking for a way to control the number of replicas for a specific model. My setup includes a Triton runt…
-
Tarea: Implementar un servicio web (API) que exponga las capacidades del modelo de lenguaje LLaMA para realizar revisiones de código. Esto implica:
- [ ] Preparación del modelo: Convertir el modelo L…
-
I exported a pytorch (model.pt) model to ONNX:
```
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
torch_model = torch.load(os.pa…
-
tf2onnx>=1.15 pins protobuf~=3.20.2.
Tensorflow >=2.13 requires tf2onnx >= 1.15 due to https://github.com/onnx/tensorflow-onnx/pull/2215
In order to use gRPC natively with M1/M2 chips, we need at…
-
部署在Linux服务器上的,没有修改requirements.txt内的依赖,运行后克隆声音报错如下:
Traceback (most recent call last):
File "/root/miniconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.…
-
can I get help on how to run with dynamic shape input in python? can you add an example in python?
```py
import cv2
import tritonclient.grpc as grpc_client
import time
import sys
sys.path.appe…
-
**LocalAI version:**
`quay.io/go-skynet/local-ai:v1.20.0-cublas-cuda12-ffmpeg`
**Environment, CPU architecture, OS, and Version:**
Linux 0f37a61ebb06 5.10.16.3-microsoft-standard-WSL2…