-
## 问题描述
用UI启动的embedding/rerank模型,没有并发相关的设置
客户端用asyncio、concurrent.futures方式发送请求,速度竟然比同步的for loop还慢
**应该怎么能使模型并发推理?**
## xinference侧启动的模型
embedding:
rerank:
## 测试结果
### embedding接口测…
-
Do you have code for batch processing images?I want to use my own dataset for batch inference.Looking forward to your reply
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
batch inference dont seem to be working. Would you mind to provide an example of batch inference for model.predict. It seems that it only works for the batch size of 1.
-
Hi,
We have successfully created batch version of the model using onnx and trt. We are trying this on a A10 GPU, here is what we have observed: for a batch of 16 we get 96ms inference time and if w…
omidb updated
7 months ago
-
- CPU architecture: x86_64
- GPU: NVIDIA H100
- Libraries
- TensorRT-LLM: v0.11.0
- TensorRT: 10.1.0
- Modelopt: 0.13.1
- CUDA: 12.3
- NVIDIA driver version: 535.129.03
Hello, I'm e…
-
### Description
The current implementation of the Inference API is to send each request individually as they are received. There are adjustable limits to how many requests can be sent concurrently.…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
…
-
So this is a strange one. I am stumped.
In way, this is sort of like #416, but I confirmed that if Batch==1, then the problem does not occur. (See below)
My inference loop looks like this
```
…
-
Is it possible to run inference in batches instead of one by one?
If so please suggest me some approach.