-
### Motivation.
TLDR; There is high CPU overhead associated with each decode batch due to the processing and generation of input/output. Multi-step decoding will be able to amortize all these overh…
-
Hi,
I am trying to implement 2D Multi-fidelity models for image interpolation. I have found that someone has implemented 1D Multi-fidelity modes based on GPtTorch (#594 ). So I have make some effo…
-
### Describe the issue
During cuda graph catpure, ORT will trigger cudaStreamSynchronize, which is not allowed in CUDA graph catpure. Call stack is like the following:
```
libonnxruntime_providers_…
-
Quick summary of some light exploration I've done profiling numba+numbast versus raw CUDA C++ kernels, as motivated by #12; put together a minimal version of one of the tests:
```python
import num…
-
User "cmorzy" reported today that they're still seeing the error/crash when Darknet reaches iteration #1000. A copy of the dataset, .names, and .cfg is available.
The exact message they're seeing …
-
Hello everyone, I am very interested in AWQ method, but now I have encountered a small problem, that is, where is "awq_inference_engine " in "llm-awq/awq/quantize /qmodule.py" ? Thank you for your ans…
-
### 🐛 Describe the bug
I use nsys to profile the below code and find the large gap before launch kernels. Does we have any method to reduce this gap?
In this example, `add` and `index_select` c…
-
The vLLM [fused moe kernel](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe.py) used for Mixtral uses the standard data parallel parallelization which works well wi…
-
Julia currently supports 1D sortperm, and PR #45211 adds sortperm support for multidmensional arrays, returning CartesianIndex.
PyTorch and NumPy both provide multidimensional argsort functions tha…
-
Hej folks,
I get the following error when I try to compile yolo on windows 10:
Severity Code Description Project File Line Suppression State
Error MSB3721 The command ""C:\Program Files\NVIDIA …