-
您好!我使用版面区域检测模块时,遇到下面的问题
## 教程
https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html#_3
## 描述问题
运行:
from paddlex import create_model
model_n…
-
## Description
I am encountering performance bottlenecks while running multi-threaded inference on high-resolution images using TensorRT. The model involves breaking the image into patches to manage…
-
FOR ASCEND TORCH_NPU BACKEND:
By the following configuration, private conv format is not allowed, which reduces format conversion and optimizes the speed of the conv operator. It can also avoid the …
-
- [ ] September: Finish implementing GPT2 Inference w/ float32, clang.
- [x] Solidify aIR System
- [x] Make aIR type-safe
- [x] Optimize AIR (FastGraph). Simplify GPT2 < 1.0s
- [x] E…
-
### 🐛 Describe the bug
I've encountered a performance issue where executorch's inference speed is significantly slower compared to ONNX, both on linux pc and Android phone. I believe this is a crit…
-
### Description & Motivation
Given the landscape of Inference providers it seems like a good idea to have a way to interact with then using a relatively 'proven' library. Developing a ModelClient for…
-
Add functionality in `LLMBlock` within the pipeline to override the global OpenAI client variable. This enhancement will allow us to support running multiple OpenAI clients for different `LLMBlock` in…
-
ONNX has evolved into much more than just a specification for exchanging models. Here's a breakdown of why:
ONNX Runtime: A highly optimized inference engine that executes ONNX models. This activel…
-
### What is the issue?
## Description:
I am using Ollama in a Docker setup with GPU support, configured to use all available GPUs on my system. However, when using the NemoTron model with a simp…
-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…