-
### Describe the bug
```py
hidden_states = F.scaled_dot_product_attention(
query, key, value, dropout_p=0.0, scale=attn.scale, is_causal=False
)
```
### Reproduction
…
-
I want to quantize model from [open-flamingo](https://github.com/mlfoundations/open_flamingo) or https://github.com/open-mmlab/Multimodal-GPT (open-flamingo v1) before lora training,
https://github…
-
1. I use the transformer model in models/official/nlp/transformer/transformer.py to train a seq2seq model, replacing the built-in keras attention with performer fast-attention(tensorflow version), fo…
-
-
In latest commit, https://huggingface.co/mosaicml/mpt-7b/commit/67cf22a4e6809edb7308dd0a2ae2c1ffb86f4984, BigDL throws below error when generate text.
INFO 2024-02-20 06:41:05,962 proxy 172.17.0.2 …
-
## Context
We have a MPT MoD prefix-lm trained on llm-foundry and then exported to HuggingFace (via your scripts).
For some fine-tuning experiments with the HF model, I tried to set up dropout.
…
-
### System Info
Transformers version 4.41.2
Platform: Ubuntu 22.04.4 LTS
Python: 3.10.14
### Who can help?
@younesbelkada @ArthurZucker
### Information
- [ ] The official example s…
-
- [x] Usage in the [decoder](https://github.com/emma-mens/transformers/blob/main/src/transformers/models/opt/modeling_opt.py#L316) layer and the corresponding `past_key_values` [usage](https://github.…
-
Epoch [1/3]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File :21, in _fwd_kernel(Q, K, V,…
-
I am having a go at running inference and evaluation for this model, and running into a TypeError in `GPTLMHeadModel`:
```
In [1]: import torch
...: from transformers import AutoTokenizer
…