-
My model includes `Dropout` module for inference, and when I run my model by `onnxruntime` locally, I will set `disabled_optimizers=["EliminateDropout"]`. And I want to know how can I do that by trito…
-
# InferenceOptimizer context manager proposal
**Only a preview idea for discussion, will experimentally implement it once we think this is a good way to move on.**
## Why?
We found many BKCs …
-
**问题描述 / Problem Description**
用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.
**复现问题的步骤 / Steps to Reproduce**
**预期的结果 / Expected Result**
正常输出指定城市的天气情况
**实际结果 / Act…
-
`model-analyzer profile --run-config-profile-models-concurrently-enable --override-output-model-repository --model-repository model_repositories --profile-models model1\,model2 --output-model-reposito…
-
### Describe the bug
I have set up a basic HF Space from an AutoTrain object detection model. The model is based on `facebook/detr-resnet-101`. The space builds and loads properly, but when i submit …
-
I have converted Mixtral to TensoRT and I am trying to use your repository to integrate with OpenAI.
I'm using the template history_template_llama3.liquid. When I run your example code for interactin…
-
**Description**
I am building a baseline for my engineering project. I want to send multiple request to multiple model and enable parallel executions when different models receives request simultaneo…
wxthu updated
7 months ago
-
Requesting a little help here. Trying to test out copilot functionality with `llama-cpp-python` with this extension. Below is my configuration setting.
```bash
{
"[python]": {
"edit…
-
batch transform error:
2022-08-30T09:01:17.792:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=50, BatchStrategy=MULTI_RECORD
2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPl…
-
Hi, we have tried to run the speculative inference process on OPT-13B and Llama2-70B-chat, but meet some issues. Specifically, for Llama2-70B-chat , we obtained performance worse than vLLM, which seem…