-
For the following IR
```
#map = affine_map (d0, d1)>
#map1 = affine_map (d0 + d1 + d2)>
func.func @gather_failure(%arg0: tensor, %arg2: tensor, %arg3 : index) -> tensor {
%c0 = arith.constant 0…
-
TL;DR:
The ONNX documentation for operator `Transpose` does not remove an ambiguity. The current example is
> For example, when perm=(1, 0, 2), given an input tensor of shape (1, 2, 3), the o…
-
Failed case (IR after deep tile matmul):
```
#map = affine_map (d0 * 32)>
module {
func.func @main_entry(%arg0: tensor, %arg1: tensor, %arg2: tensor) -> tensor attributes {llvm.emit_c_interface}…
-
### 🚀 The feature, motivation and pitch
I'm working with COO sparse tensors and would like to get a permutation of any COO sparse tensor.
In the current version, `torch.permute` throws the following…
-
### Issue Description
I encountered an issue when trying to add the `LSTM_` class to `layers.py` while using the PX pruner. Specifically, the pruner fails to successfully obtain the scores.
####…
-
I am trying to optimise a function where I think it'd be much faster if I could force the physical transpose to a pre and op, rather than being fused into the op.
[optimisation_barrier](https://gi…
-
We are trying to using all reduce TP to slash the communication time. I noticed that you have implemented [Row split + all_reduce for MLP (not faster, disabled)](https://github.com/turboderp/exllamav2…
-
I open this issue to point out the problems in using pattern matching with the current tensor module, and to propose some internal refactory of how data are stored into `TensMul` objects.
The main pr…
-
I encountered a CUDA memory error when using torchsort.soft_rank during parallel training on the GPU. The error message is as follows:
File "/home/xxx/anaconda3/envs/DL2/lib/python3.10/site-packages/…
-
While working on PR https://github.com/pytorch/pytorch/pull/43068, I found a method on `LSTM` which breaks the Liskov substitution principle. The class `LSTM` inherits from `RNNBase` and redefines a m…