-
- [x] Add oneDNN binaries to Yggdrasil -- https://github.com/JuliaPackaging/Yggdrasil/pull/9345
- [ ] how expensive is it to construct the internal memory type for the operations?
- [ ] create a `on…
-
This can lead to false negatives because the threshold is overly relaxed.
```diff
diff --git a/tests/cpp/test_gpu_fused_reduction.cpp b/tests/cpp/test_gpu_fused_reduction.cpp
index e67875f4..b3923d6…
-
Hi! When attention op gets fused in a single op with Myelin, it's not written in trex-tooltip if it's using FlashAttention / proper fusion or not (and if it's using quantization under the hood, especi…
-
Since our DistributedOptimizer takes another optimizer and turns it into ZeRO-1, can we make it do a fused optimizer like this? It should take an optimizer and turn it into a fused ZeRO-1 in a generic…
-
### 🐛 Describe the bug
https://github.com/microsoft/DeepSpeed/issues/6673
try install deepspeed . on torch 2.5.0-cuda
then
running build_ext
```error
D:\my\env\python3.10.10\lib\site-packages…
-
Great work you have!
Could it have support for fused location ?
-
The idea is to create a generic assembly, or a special type of assembly / component, that can do model aggregation. Several models that have similar I/Os for similar purpose would be able to be run in…
-
### Your GTNH Discord Username
Alchelio
### Your Pack Version
2.6.1
### Your Proposal
QVoiwalkers and NVoidwalkers should increase the flight speed as well, currently you fall like a rock if you …
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/4e61a3ebdfe2b07b4d331cc46832eb4a6b49941c/src/solver.cpp#L264
Can we introduce some dedicated primitive (e.g. `Fused`) for fused convolutions and …