-
Literature in particular for multiterm symmetries:
- Li, Li, Li, Riemann Tensor Polynomial Canonicalization by Graph Algebra Extension, https://arxiv.org/abs/1701.08487v1 has good pointers to the l…
-
We are trying to using all reduce TP to slash the communication time. I noticed that you have implemented [Row split + all_reduce for MLP (not faster, disabled)](https://github.com/turboderp/exllamav2…
-
**Describe the Bug**
**Minimal Steps/Code to Reproduce the Bug**
```
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which sup…
-
See https://github.com/Theano/Theano/issues/2158#issuecomment-58371370
Here is code that it should have detected as broadcastable:
```
import theano
from theano.tensor.shared_randomstreams import Ra…
-
# Requirement :
It's hard to debug rocMLIR issues right now because the trace variable is only effective during the final run, but not during the benchmarking/quick tuning stage, which is when we mi…
-
#538 introduces an `interleaved` option to `MultitaskMultivariateNormal`. The point of this is to allow `MultitaskMultivariateNormal` to have a covariance matrix that is either `K_{task} \kron K_{data…
-
# 🐛 Bug
I ran the code from the example that uses multiple GPUs. https://github.com/cornellius-gp/gpytorch/blob/master/examples/02_Scalable_Exact_GPs/Simple_MultiGPU_GP_Regression.ipynb
I used 1…
-
I encountered a CUDA memory error when using torchsort.soft_rank during parallel training on the GPU. The error message is as follows:
File "/home/xxx/anaconda3/envs/DL2/lib/python3.10/site-packages/…
-
At the MLPDS meeting someone brought up that in multicomponent the order of components currently matters because the learned representations are concatenated. Could we add an option to make the archit…
-
I'm trying to compile the latest commit d0c0dedf529dbc7af2d886ec3594b032f86ae6c4 of tiledarray for CUDA platform (V100) and get a compiler error:
[ 95%] Building CXX object src/CMakeFiles/tiledarra…