inference-acceleration Search Results

Project-MONAI/tutorials #1865

Add a inference acceleration tutorial

Like what we did for training acceleration: https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_training_tutorial.ipynb

yiheng-wang-nv updated 23 hours ago

ggerganov/llama.cpp #9578

Feature Request: Add native int8 pure CUDA Core accelerate f…

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…

SakuraRK updated 4 weeks ago

xinjie-liu/AutoEncodingBayesianInverseGames.jl #1

compiler error

ERROR: MethodError: no method matching ParametricMCPs.ParametricMCP(::MCPGameSolver.var"#38#54"{…}, ::ParametricMCPs.SparseFunction{…}, ::ParametricMCPs.SparseFunction{…}, ::Vector{…}, ::Vector{…}, ::…

Hastws updated 35 minutes ago

doantienthongbku/AsConvSR-TorchLighting #1

About model inference acceleration

Hi! Very impressive project! My main goal is to export the model to intermediate format and test accelerability on many platforms. I am trying to accelerate the assembled convolution module for be…

OuterSpaceTraveller updated 7 months ago

keras-team/keras #18885

The issue of reduced half-precision speed when using ROCm de…

Hello, I am currently using the AMD Instinct MI50 GPU to train models. It has 26 Tflops of fp16 and 13 Tflops of fp32 compute power, but it lacks tensor cores. My experiments on PyTorch indicate th…

KegangWangCCNU updated 1 week ago

intel-analytics/ipex-llm #11984

NPU Acceleration Library model loaing

Can we support NPU acceleration library, NPU inference model save/load in low bits? It takes about 48s to load the 7B model directly.

juan-OY updated 1 month ago

oneapi-src/oneDNN #2114

How to modify oneDNN to enable GEMM operation acceleration o…

My use case is inference acceleration on a CPU using TensorFlow Serving, and my hardware architecture is AArch64 (ARMv8). Currently, I've noticed that with oneDNN enabled, the performance bottleneck i…

nanzh-19 updated 3 weeks ago

comfyanonymous/ComfyUI #2194

torch modules using inference_mode cause problems when doing…

Hi! Thank you so much for such an awesome repository! The torch modules here are executed in `inference_mode` not `no_grad`, which causes some problems when doing some accelerations, such as torch.…

zhaozhixu updated 1 month ago

triton-inference-server/server #7719

ONNX CUDA session not working in python backend

**Bug Description** The ONNX CUDA session is not working in the Python backend. When attempting to run inference using the ONNX model with CUDAExecutionProvider, the session fails to initialize or ex…

jsoto-gladia updated 6 hours ago

PABannier/bark.cpp #195

Support fish.audio

Hi! I heard about a very promising model some while ago that you might be interested in. It's called fish.audio. Here's a youtube demo : https://www.youtube.com/watch?v=Ghc8cJdQyKQ Here's the…

thiswillbeyourgithub updated 1 week ago

1000+ results for inference-acceleration

1000+ results
for inference-acceleration