inference-optimization Search Results

PaddlePaddle/PaddleX #2546

AttributeError: 'paddle.base.libpaddle.AnalysisConfig' objec…

您好！我使用版面区域检测模块时，遇到下面的问题 ## 教程 https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html#_3 ## 描述问题运行： from paddlex import create_model model_n…

Yt-nn updated 1 day ago

NVIDIA/TensorRT #4241

How can I optimize multi-batch and parallel inference in Ten…

## Description I am encountering performance bottlenecks while running multi-threaded inference on high-resolution images using TensorRT. The model involves breaking the image into patches to manage…

Nizam-Drongo-Space updated 1 week ago

huggingface/diffusers #8852

ASCEND TORCH_NPU INFERENCE performance improvement optimizat…

FOR ASCEND TORCH_NPU BACKEND: By the following configuration, private conv format is not allowed, which reduces format conversion and optimizes the speed of the conv operator. It can also avoid the …

lightnateriver updated 2 weeks ago

hikettei/Caten #102

Milestones

- [ ] September: Finish implementing GPT2 Inference w/ float32, clang. - [x] Solidify aIR System - [x] Make aIR type-safe - [x] Optimize AIR (FastGraph). Simplify GPT2 < 1.0s - [x] E…

hikettei updated 2 weeks ago

pytorch/executorch #6961

vit executorch inference speed much slower than onnx

### 🐛 Describe the bug I've encountered a performance issue where executorch's inference speed is significantly slower compared to ONNX, both on linux pc and Android phone. I believe this is a crit…

salvadog updated 1 day ago

SylphAI-Inc/AdalFlow #269

LiteLLM ModelClient Integration

### Description & Motivation Given the landscape of Inference providers it seems like a good idea to have a way to interact with then using a relatively 'proven' library. Developing a ModelClient for…

PhiBrandon updated 1 day ago

instructlab/sdg #217

Add Functionality in LLMBlock to Override Global OpenAI Clie…

Add functionality in `LLMBlock` within the pipeline to override the global OpenAI client variable. This enhancement will allow us to support running multiple OpenAI clients for different `LLMBlock` in…

npalaska updated 5 days ago

The-AI-Alliance/ai-accelerator-software-ecosystem-guide #5

NNEF / ONNX update

ONNX has evolved into much more than just a specification for exchanging models. Here's a breakdown of why: ONNX Runtime: A highly optimized inference engine that executes ONNX models. This activel…

edelsohn updated 1 month ago

ollama/ollama #7761

High Inference Time and Limited GPU Utilization with Ollama …

### What is the issue? ## Description: I am using Ollama in a Docker setup with GPU support, configured to use all available GPUs on my system. However, when using the NemoTron model with a simp…

nicho2 updated 5 days ago

InternLM/lmdeploy #1879

[Feature] long context inference optimization

### Motivation This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…

zhyncs updated 4 months ago

1000+ results for inference-optimization

1000+ results
for inference-optimization