-
Check out `wjy/slice` and `_bn && bin/nvfuser_tests --gtest_filter=AliasTest.SliceOfExpandedBroadcast`.
The bug is somewhere in https://github.com/NVIDIA/Fuser/blob/7a6f19cce1cf0167700047ca7eb58f5…
-
auto-reduced (treereduce-rust):
````rust
trait Xxx {}
trait Yyy: Xxx {}
trait Aaa {
type Y: Yyy;
}
trait Bbb {
type B: for: Yyy + Xxx {
type Z;
}
trait Aaa {
ty…
-
The current work-around is to use `-fno-sized-deallocation` see https://github.com/scikit-learn/scikit-learn/pull/28506#discussion_r1512897297 for more details.
This can be reproduced locally with …
-
We are missing an optimization to build reduction ops when possible. Consider the following which is doing `b[0] | b[1]`:
```mlir
module {
hw.module @Foo(%a: i1, %b: i2) -> (c: i1) {
%0 = …
-
### Description
Presently, distributed layernorm is the slowest Op in Llama3-TG taking around 17% of the total device time. It only supports DRAM interleaved inputs since it was initially written for…
-
Strength reduction should still be profitable in these cases.
Suggested by @AndyAyersMS here: https://github.com/dotnet/runtime/pull/104243#discussion_r1664366632
-
### What client do you play on?
enUS
### Faction
Both
### Content Phase:
Generic
### Current Behaviour
Just BG reduction:
![image](https://user-images.githubusercontent.com/50233…
-
We should have special-purpose code for division and modular reduction by $n = 2^e + c$ where ``e >= 2 * FLINT_BITS`` and `c` fits in a ``slong``, say.
-
Hello, I'm using d3-graphviz for rendering a diagram. I would create the graph with [transitive reduction filter](https://graphviz.org/pdf/tred.1.pdf). But is it possible or not?
If yes, can I execut…
-
To facilitate data transfer between the server and client, user-defined lossy and lossless compression is employed.
The techniques discussed below can reduce the data volume on the client. The impa…