-
Looking at the code, it seems that there are no weights for key, query, and value when implementing self attention. Is this the correct implementation?
-
### 🐛 Describe the bug
```python
import torch
from torch import nn, Tensor
from torch.export import export_for_inference, Dim
from torch.nn.attention.flex_attention import flex_attention
class…
-
Hi
Im trying to do inference on a awq quantized model and im constantly getting this error when trying to generate text.
Im using Qwen2.5-72B-Instruct-AWQ.
Some code to give context:
sel…
-
Hi, I wonder how these figures were obtained.
SVD on self-attention map would produce U, S, and V.T.
How did you obtain the figures??
![capture](https://github.com/google/prompt-to-prompt/asset…
-
Any possibility to fix this maybe some version of vllm?
(LLMRayActor pid=1005) WARNING 11-15 15:19:28 gemma2.py:351] Some weights are not initialized from checkpoints: {'layers.18.mlp.gate_up_proj.…
-
Hi,
First thank your very much for your work. It adds a huge improvement to DETR family.
And your paper was really well explained and written.
Also thank you for publishing your code & models, i…
-
Hi, I tried a test about compiling unet(torch.float16), which is the part of StableDiffusionXLPipeline in Inferentia2.8xlarge and it failed.
When the latent size of unet is (64, 64), it did not fai…
-
Just realized i get the below warning with Salesforce/blip-image-captioning-large ; i think i already ran results for it, but they're probably random in that case; maybe someone could check the result…
-
Traceback (most recent call last):
File "/home/yy/MSSR-main/MSSR-main/run_model.py", line 39, in
run_result = run_recbole(model=args.model, dataset=args.dataset, config_file_list=config_file_…
-
In paper:
```We propose a novel lightweight relation extractor, EGTR, which exploits the self-attention of DETR decoder, as depicted in Fig. 3. Since the self-attention weights in Eq. (1) contain N ×…