-
### Describe the bug
Is it possible to get back the `attention_mask` argument in the flux attention processor
```
hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.…
-
I dont know why whenever i set use_dora = True it always give me this error when i train:
`RuntimeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 tr…
-
### 🐛 Describe the bug
When running `F.scaled_dot_product_attention` with an input matrix that contains NaNs on CPU, with PyTorch 2.4, the output is a NaN matrix, but with PyTorch 2.5, it is a zeros …
-
Inspired from [this paper](https://arxiv.org/abs/2405.14862), we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in hu…
-
### Feature request
I want to add the ability to use GGUF BERT models in transformers.
Currently the library does not support this architecture. When I try to load it, I get an error TypeError: Ar…
-
Hi everyone, I'm new to Mamba. I see in Mamba2 that there is causal 1D conv, so that Mamba2 can only focus on the previous tokens when predicting the current token. But if I want Mamba2 to be able to …
-
Attempting to generate with Mistral Small causes this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most r…
-
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_da…
-
hey, while running on 4bit quantized model from https://huggingface.co/ThetaCursed/Ovis1.6-Gemma2-9B-bnb-4bit i am getting the following error
```
{
"name": "RuntimeError",
"message": "self an…
-
我的代码:
```python
for i, item in enumerate(text):
item += " [SEP] "
item += sentences[i]
text[i] = item
#{
# "ID": "__cover_VB_1",
# "term": "cov…