-
For `volo_d1_224`, I was trying to figure out where does the memory footprint go.
* One known issue is that Python hold references of inputs to the backward pass due to how AOT Autograd works.
* …
-
- [x] Build target 3Q cyclic SWAP unitary
- [x] Use Hamiltonian to see if feasible to build target cyclic SWAP unitary
- [x] Sweep g, phi terms
- [x] Advanced optimization sweep (ML?), with time-de…
-
I've tried loading two mods that use Game.xml files, one for S2/v4 and one for CD/v3, making sure both of them have the TargetVersion tag, however both mods crash after loading the file. These mods lo…
-
### 🐛 Describe the bug
Accuracy failure - Dtype mismatch
### Minified repro
~~~
from math import inf
import torch
from torch import tensor, device
import torch.fx as fx
import torch._dyn…
-
### 🐛 Describe the bug
`x + GroupNorm()(x)` stacked enough times seems to result in NaN gradients' being returned by autograd.
affects stable-diffusion. breaks CLIP guidance. I believe this explai…
-
Repro
~~~
import torch
import torchdynamo
from torch import tensor, device
import torch.fx as fx
from torchdynamo.testing import rand_strided
from math import inf
from torchdynamo.debug_util…
-
https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py
This is a good reference for breaking/decomposing torch ops.
-
Some decomp projects use fixed-address declarations for OS memory mapped I/O, like this:
```c
u32 __OSBusClock : 0x800000F8;
```
I think this is nonstandard syntax which is supported by MWCC.
…
-
`benchmarks/huggingface.py --training -dcuda --accuracy --training --inductor --only=XLNetLMHeadModel`
Error
~~~
RuntimeError: Overloaded torch operator invoked from Python failed to many any s…
-
~~~
import torch
import torchdynamo
from torch import tensor, device
import torch.fx as fx
from torchdynamo.testing import rand_strided
from math import inf
from torchdynamo.debug_utils imp…