hidden-causal Search Results

huggingface/diffusers #10025

attention mask for transformer Flux

### Describe the bug Is it possible to get back the `attention_mask` argument in the flux attention processor ``` hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.…

christopher5106 updated 7 hours ago

unslothai/unsloth #1154

"FlashAttention only support fp16 and bf16 data type" error …

I dont know why whenever i set use_dora = True it always give me this error when i train: `RuntimeError Traceback (most recent call last) Cell In[26], line 1 ----> 1 tr…

nguyentd01 updated 4 weeks ago

pytorch/pytorch #141128

Possible regression of F.scaled_dot_product_attention on CPU…

### 🐛 Describe the bug When running `F.scaled_dot_product_attention` with an input matrix that contains NaNs on CPU, with PyTorch 2.4, the output is a NaN matrix, but with PyTorch 2.5, it is a zeros …

BenjaminBossan updated 15 hours ago

vllm-project/vllm #5228

[Feature]: Custom attention masks

Inspired from [this paper](https://arxiv.org/abs/2405.14862), we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in hu…

ojus1 updated 1 week ago

huggingface/transformers #34238

GGUF support for BERT architecture

### Feature request I want to add the ability to use GGUF BERT models in transformers. Currently the library does not support this architecture. When I try to load it, I get an error TypeError: Ar…

Dimmension updated 1 month ago

state-spaces/mamba #624

How to enable Mamba2 to see all tokens when predicting the c…

Hi everyone, I'm new to Mamba. I see in Mamba2 that there is causal 1D conv, so that Mamba2 can only focus on the previous tokens when predicting the current token. But if I want Mamba2 to be able to …

maxin-cn updated 1 week ago

unslothai/unsloth #1044

Issue With Mistral Small

Attempting to generate with Mistral Small causes this error: --------------------------------------------------------------------------- RuntimeError Traceback (most r…

DaddyCodesAlot updated 2 months ago

unslothai/unsloth #863

trainer.train() AttributeError: 'NoneType' object has no att…

from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_da…

yangwendy updated 3 weeks ago

AIDC-AI/Ovis #29

running on 4bit model

hey, while running on 4bit quantized model from https://huggingface.co/ThetaCursed/Ovis1.6-Gemma2-9B-bnb-4bit i am getting the following error ``` { "name": "RuntimeError", "message": "self an…

haiderasad updated 3 weeks ago

MouYongli/LLMs4OL #4

LlamaSequenceClassification

我的代码: ```python for i, item in enumerate(text): item += " [SEP] " item += sentences[i] text[i] = item #{ # "ID": "__cover_VB_1", # "term": "cov…

Kleinpenny updated 3 weeks ago

1000+ results for hidden-causal

1000+ results
for hidden-causal