-
# Bug Report
Iam referring to [https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama/smooth_quant](https://github.com/microsoft/onnxruntime-inference…
-
I have 8,9 fps on yolov8n with tensorRT+ cpp
-
Definition of the request payload for the chat completions
-
When I try to evaluate my model or make predictions on the val set, I get the following error:
```
---------------------------------------------------------------------------
ValueError …
-
Dynamic LoRA (Low-Rank Adaptation) switching functionality, allowing users to change LoRA models on-the-fly during inference without reloading the entire model.
-
### Request Description
I was trying to implement a CatBoost model inferencing via ONNX and ran into this error:
```
RuntimeError: Exception from src/inference/src/cpp/core.cpp:92:
Check 'error_…
-
Definition of the response payload for the chat completions
-
### Feature request
In the documentation, there is not enough info about the default values TGI enforces if client request do not contain parameters like `temperature`, `top_p`, `presence_frequency` …
-
https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/ocr
该方案更换ICDAR2015数据集,采用预训练ResNet50模型(更改模型配置即可)可以成功运行,其精度基本不变,速度减少为1/4,获得Inference模型。此时的模型在转为ONNX时报错,缺少量化配置文件(cali…
-
***Under Construction***
The Answer Engine, released in version 0.13, provides a Q&A interface for Tabby's users to interact with the LLM, optionally within the context of a connected repository. T…