-
**Is your feature request related to a problem? Please describe.**
Large input volumes have to be processed via a sliding window algorithm, otherwise OOMs can happen quickly. There are two constraini…
-
This is a visual model depend on phi-2, I believe OpenVINO should be able to support it and perform efficient inference on personal computers, it will be very useful .
[TinyGPT-V: Efficient Multi…
-
Ovis1.6-Llama3.2-3B-GPTQ-Int4. How can it be inferred using the CPU?
-
### Feature request
Optimize Transformers' image_processors to decrease image processing time, and reduce inference latency for vision models and vlms.
### Motivation
The Transformers library relie…
-
@dusty-nv thanks for NanoLLM for CUDA=12.6 - works well!!
However, when I invoke it with:
```
sudo jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm.agents.video_query --api=…
-
# OPEA Inference Microservices Integration for LangChain
This RFC proposes the integration of OPEA inference microservices (from GenAIComps) into LangChain [extensible to other frameworks], enabli…
-
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
## 🐛 Bug
When I run the demo.py , the error is :
```
Tracebac…
-
Dear DeepLabCut Team,
Thank you for your tremendous work on this invaluable open-source tool. I truly appreciate your efforts and dedication.
I have a suggestion regarding the use of pre-trained…
-
### 🚀 The feature, motivation and pitch
Gemma-2 and new Ministral models use alternating sliding window and full attention layers to reduce the size of the KV cache.
The KV cache is a huge inferen…
-
Right now, in experiments I have been running, there is a significant bottleneck in retrieving and saving results in parallel batch inference. This is significantly hindering throughput, as each worke…