-
I tested LyCORIS/LoCon preset for SDXL and it works fine, but if the box "DoRA Weight Decompose" is checked (with or without extra algorithms) then it will not continue. I tried default preset with 'G…
-
`sdpa_ex` implementation of `torch.nn.functional.scaled_dot_product_attention` returns all output tensor proxy in trace to be on `cuda` but at runtime some outputs are on `cpu`.
Repro
```python
i…
-
### Is there an existing issue for this problem?
- [X] I have searched the existing issues
### Operating system
Linux
### GPU vendor
Nvidia (CUDA)
### GPU model
_No response_
#…
-
### 🐛 Describe the bug
in the blog https://pytorch.org/blog/flexattention/, it says `For example, for a sequence length of 1 million, the BlockMask would only use 60MB of additional memory`.
howev…
-
I want to fine-tune a model using unsloth. Every thing works fine on colab but on my system I got the following:
{
"name": "NotImplementedError",
"message": "No operator found for `memory_efficie…
-
```python
def memory_efficient_attention(
query: torch.Tensor,
key: torch.Tensor,
value: torch.Tensor,
attn_bias: Optional[Union[torch.Tensor, AttentionBias]] = None,
p: floa…
-
I got an error when starting gnfactor training:
my env setting is below
GPU: nvidia l40s
cuda: 11.7 ( in a docker environment)
ubuntu: 22.04
it seems that xformers is incompatible with torc…
-
I have installed xformers==0.0.13, but when executing the build script, I still get an error "RuntimeError: No such operator xformers::efficient_attention_forward_generic - did you forget to build xfo…
-
Hi,
I have the issue that my sequences are strongly varying in length, with sometimes having outliers that are an order of magnitude longer than the average. In default pytorch one can only pass a pa…
-
Hi, Thanks for any suggestions.
The largest resolution that could be used for training is 512 × 512 with ~76G memory cost.
I set the enable_xformers_memory_efficient_attention to True but nothing ch…