-
Hello,
could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this woul…
-
This mistake is really strange... I follow [the readme](https://github.com/ashkamath/mdetr/blob/main/.github/clevr.md) for training MDETR on CLEVR.
Firstly, I've ran the following command:
```
pyth…
-
Many modern architectures use either GQA or MQA rather than MHA, but `dot_product_attention` allows only MHA by enforcing `query`, `key` and `value` should have the same number of heads:
https://gi…
-
Could you please tell me about the accuracy of the model under the GQA task? I only reached 45%
-
### Feature request
It would be nice if when I choose different key_value_heads (key_value_heads < attention_heads) on config's model, automatically the attn weights were computed by mean pooling. …
-
GQA dataset test
-
## ❓ Questions and Help
In your configs, I saw there exist difference between VG and GQA. But I cannot find the support for the GQA dataset.So any ideas about the GQA support?
-
I generated the `submit_predict.json` and submited it to GQA evaluation server. However, I got an accuracy of 0 in test phase, but the result in dev phase makes sense. Is it possible that I predict al…
-
Hello,
I was attempting to evaluate the model on the GQA dataset by following the instructions provided in the [Getting Started guide](https://github.com/SHI-Labs/VCoder/blob/main/docs/Getting_Star…
-
Dear author, now if I want to add a GQA dataset for training, what do I need to do exactly?