-
# Background
I experimented with a rust-based exo implementation that used [UniFFI](https://mozilla.github.io/uniffi-rs/latest/) for foreign language bindings so I could run it from a Swift iOS app…
-
## Goal-State/What/Result
Have separate Tokio runtimes for query/AI computation (inference, embeddings, etc) and acceleration refreshes. The goal is that all queries should stay fast no matter what e…
-
## Description
I tried to quote the following documents directly,tools/pytorch-quantization/pytorch_quantization/calib/histogram.py,and Use HistogramCalibrator.compute_amax() to calculate the max…
-
Hello! Thank you very much for your excellent work, which enables the distributed running of large models on heterogeneous devices! I was wondering if this project supports Android devices. I am curre…
-
I am using AutoModelForSequenceClassification for classifying a large model. Can I use this library, and how should I use it?
Additionally, if my output is only one token and I do batch inference, w…
-
**Describe the bug**
After resuming from sleep, inference with nvidia GPU doesn't work until restarting the system.
**Steps to reproduce**
Steps to reproduce the behavior:
1. Setup GPU Accelerat…
-
Hi, thank for your great work! Your code is based on the lightning torch. When i deployed the model on a single machine with multiple GPUs, it started several GLOBAL processes, which is necessary for …
-
Intel, AMD, Qualqomm, etc are getting powerful NPUs (+40TOPS) for inferencing.
Is there any plan to incluide in ml.net functionality to be able to run and inference these models easily from C# offl…
-
Is the lambdalabs/sd image variations diffusers model sd1.4? Is it possible to use the SD1.5 model to generate acceleration through a few steps of hypersd, or is there any other solution that can opti…
-
## 🚀 Feature
Addition of Qualcomm NPU devices inference
## Motivation
Qualcomm released an AI sdk which includes ability for models to run on it's Qualcomm® Hexagon™ NPU, adding this feature wo…