-
Hey, I made a small change in generate_v2.py to run a loop to the whole test set. I am getting some error because of cacheing I guess. I have pasted the error message and code below which i am getting…
-
I am trying to reproduce the Baseline CLIP results for Single object GQA setting. I am getting a much lower mAP of 0.18, which does not match the paper's numbers, I am using the pooled output of CLIP'…
-
With the new release of version 3.2.0, the use of ONNX has become much easier but initial local tests led to various errors, meaning that it was not possible to use ONNX Runtime via Sentence Transform…
-
## Summary
For the full Llama 3B model bringup, we want to test the main standalone blocks before running full model e2e. One of those blocks is the attention module.
## Details
For initial Llama…
-
我希望使用T5来复现,但是将模型替换后出现了缺失 decoder_input_ids 的问题,如下所示:
```
Traceback (most recent call last):
File "/mnt/bn/songhengrui-nas/Scented-EAE/main.py", line 85, in
main()
File "/mnt/bn/songhengr…
-
Hi, I set` F.scaled_dot_product_attention = sageattn`, in modeling_llama.py, and run the inference code,
I see it run `sageattn_qk_int8_pv_fp16_cuda` in `sageattention/core.py`.
The results are:
…
-
(allegro) D:\PyShit\Allegro>python single_inference.py ^
More? --user_prompt "A seaside harbor with bright sunlight and sparkling seawater, with manyboats in the water. From an aerial view, the boats…
-
### OpenVINO Version
2024.3
### Operating System
Ubuntu 20.04 (LTS)
### Device used for inference
NPU
### Framework
PyTorch
### Model used
torch.nn.MultiheadAttention
### Issue description
…
-
i have 3060 ti with 8gb vram.
when i run
Loading personal and system profiles took 953ms.
(base) PS C:\Windows\system32> e:
(base) PS E:\> cd MagicQuill
(base) PS E:\MagicQuill> conda activate…
-
This issue is not in response to a performance regression.
The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-atten…