-
Thanks for your excellent work.
I see that onnx model (for example Vit converted to onnx) is potential if it can inference with batch inputs because of reducing time and boosting performance.
N…
-
Would be nice to have batch inference support similar to [`mlx_parallm`](https://github.com/willccbb/mlx_parallm), happy to try and add soon. @Blaizzy can you assign this to me?
-
- [x] Use `llama_decode` instead of deprecated `llama_eval` in `Llama` class
- [ ] Implement batched inference support for `generate` and `create_completion` methods in `Llama` class
- [ ] Add suppo…
-
Thanks for the conversion code for phi3-vision.
I'm making a app for concurrent requests that need continuous batching. Can I inference phi3-vision with batchsize larger than 1 ( I mean in onnx mode…
-
### System Info
x85-64
4 A10
0.9.0
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported tas…
-
Hi there! Great work!
Is it possible to run a batched inference?
Thanks!
-
Wonder does it support batch inference?
I read the code of eval. Seems each time it only eval on one video.
-
**Motivation:** Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.
*…
-
I would like to perform batch inference. Can you please point me some resources or provide support for it? Thanks a lot
-
So we're having issues inferencing efficiently at scale, and of course we're processing the audio parts one by one as is default for inference, but is there any support for batch inference to speed th…