inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/onnxruntime_backend #158

onnx disabled optimizers for dropout

My model includes `Dropout` module for inference, and when I run my model by `onnxruntime` locally, I will set `disabled_optimizers=["EliminateDropout"]`. And I want to know how can I do that by trito…

zhaozhiming37 updated 1 year ago
5
intel-analytics/ipex-llm #6675

Nano: InferenceOptimizer context manager proposal

# InferenceOptimizer context manager proposal **Only a preview idea for discussion, will experimentally implement it once we think this is a good way to move on.** ## Why? We found many BKCs …

TheaperDeng updated 1 year ago
9
chatchat-space/Langchain-Chatchat #4956

[BUG] 使用xinference框架调用Agent出现问题，使用ollama则不会

**问题描述 / Problem Description** 用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner. **复现问题的步骤 / Steps to Reproduce** **预期的结果 / Expected Result** 正常输出指定城市的天气情况 **实际结果 / Act…

southkorea2013 updated 6 days ago
1
triton-inference-server/server #7223

model analyser stucks

`model-analyzer profile --run-config-profile-models-concurrently-enable --override-output-model-repository --model-repository model_repositories --profile-models model1\,model2 --output-model-reposito…

riyajatar37003 updated 4 months ago
2
huggingface/huggingface_hub #2290

HF Spaces inference failing with error: huggingface_hub.util…

### Describe the bug I have set up a basic HF Space from an AutoTrain object detection model. The model is based on `facebook/detr-resnet-101`. The space builds and loads properly, but when i submit …

rileybolen updated 1 month ago
6
npuichigo/openai_trtllm #46

Missing spaces

I have converted Mixtral to TensoRT and I am trying to use your repository to integrate with OpenAI. I'm using the template history_template_llama3.liquid. When I run your example code for interactin…

Mary-Sam updated 3 months ago
2
triton-inference-server/server #6251

Send multiple request to multiple task

**Description** I am building a baseline for my engineering project. I want to send multiple request to multiple model and enable parallel executions when different models receives request simultaneo…

wxthu updated 7 months ago
5
huggingface/llm-vscode #140

Running llama-cpp-python OpenAI compatible server

Requesting a little help here. Trying to test out copilot functionality with `llama-cpp-python` with this extension. Below is my configuration setting. ```bash { "[python]": { "edit…

abasu0713 updated 2 weeks ago
7
aws-samples/amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve #10

ModelError Parameter model_name is required

batch transform error: 2022-08-30T09:01:17.792:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=50, BatchStrategy=MULTI_RECORD 2022-08-30T09:01:17.883:[sagemaker logs]: st-s3/trainingPl…

akqjxx updated 1 year ago
1
flexflow/FlexFlow #1377

Performance Issue

Hi, we have tried to run the speculative inference process on OPT-13B and Llama2-70B-chat, but meet some issues. Specifically, for Llama2-70B-chat , we obtained performance worse than vLLM, which seem…

lethean287 updated 2 months ago
1

上一页 1...88 89 90 91 92 93 94...100 下一页

1000+ results for inference-server

1000+ results
for inference-server