-
either torch.compile / triton, forward / backward operations got too much activations that are probably bottlenecking training.
For some reason, i got about 30% speedup at 1B scale but does not seem …
-
### Build ID
U2UUI34.40-24-6
### Additional
Please provide device tree too
This is for kernel tweaking to get most out of it and for custom rom development projects.
-
Repro c++ script.
```
TEST_F(NVFuserTest, Repro) {
auto fusion = std::make_unique();
FusionGuard fg(fusion.get());
TensorView* tv0 = makeContigConcreteTensor({1, 10});
…
-
Seems like you've been suggesting people to not use it recently, was wondering whether the issue has been identified?
-
**What is your question?**
In the examples provided, EVT demonstrates the capability to fuse different epilogue functions, optimizing their execution. I'm interested in knowing whether EVT can also i…
-
in mean+stddev, softmax and layernorm, one reduceop builds up on its parent reduceop.
tinygrad is making progress towards fusing these into a single kernel.
### Milestones
1. mean+stddev fusion…
-
## CVE-2020-12652 - Medium Severity Vulnerability
Vulnerable Library - linuxlinux-4.19.87
The Linux Kernel
Library home page: https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/?wsslib=linux
Fou…
-
### 🚀 The feature, motivation and pitch
Instead, perhaps try to refactor it as `ComputedBuffer`
Examples: Adamax
Already bad with `config.aggressive_fusion = True`:
![image](https://github.c…
-
If you have a DAG of binary operations, you can traverse it in some topological order and generate proper bitcodes for your GPU kernel. OmniSci does it on parsed SQL queries, and more specifically dif…
-
Do you have any details how you fuse kernels together?
If I am not mistaken, Nvidia's project does it by hand.
Do you do it automatically? Are there any limitations?
ib00 updated
10 months ago