-
To improve performance for transpose kernels we should load the transposed inputs into LDS directly, and then read from LDS instead. We have function like `preload_copy` which will do this preloading …
-
[https://people.csail.mit.edu/devadas/pubs/pldi24_fhelipe.pdf](https://people.csail.mit.edu/devadas/pubs/pldi24_fhelipe.pdf)
Paper builds further on top of HECO, for optimal data layout
Paper impr…
-
In the memory layout work in https://github.com/pytorch/pytorch/issues/19092 we have a problem where strides undetermine the layout of a tensor when size = 1 or 0. In particular, if I have a tensor wi…
-
aten reference: [permute](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L2230).
Semantics here are limited if we're to maintain the view functionality of…
-
If we are going to refactor to use the builtin vector method then I would consider doing a full refactor of all the underlying classes in this system.
The system should support arbitrary tensors …
-
Hello,
I have used tensor-cell2cell but currently using a dataset that doesn't have a clear context/condition to set, hence I have reverted to the previous cell2cell. I wanted to clarify whether ce…
-
# Requirement :
It's hard to debug rocMLIR issues right now because the trace variable is only effective during the final run, but not during the benchmarking/quick tuning stage, which is when we mi…
-
# 🐛 Bug
## To reproduce
Simply run this [MultiGPU example](https://github.com/cornellius-gp/gpytorch/tree/master/examples/02_Scalable_Exact_GPs/Simple_MultiGPU_GP_Regression.ipynb).
** Stac…
-
There's something wrong with `TORCH_STATIC` when building with MKL DNN.
Steps to reproduce:
1. Apply this diff to turn on static build:
```
diff --git a/caffe2/CMakeLists.txt b/caffe2/CMakeLis…
-
### Problem Description
There is no built in method for converting a Qobj into a numpy array such that the `shape` of the numpy array reflects the `dims` of the Qobj. While fixing this is a oneliner …