-
i saw the issue with chatglm2-6b.
it run successfully if with numactl -m 0 -C 0-23.
it run failed if with numactl -m 0 -C 0-31, or 0-47 , or 0-55.
i can be reproduced with INT8_ASYM or 4BIT_…
-
### 🚀 The feature, motivation and pitch
Currently, providing an attention mask argument to [jagged_scaled_dot_product_attention](https://github.com/pytorch/pytorch/blob/main/torch/nested/_internal/…
-
### What is the URL of the page with the issue?
https://pkg.go.dev/github.com/docker/buildx#section-readme
### What is your user agent?
All user agents
> [!NOTE]
> - This issue affects al…
-
https://showarp.github.io/2024/03/12/Transformer%E5%8E%9F%E7%90%86%E8%A7%A3%E9%87%8A-%E5%85%B6%E4%B8%80/
Transformer 源码解析前言今天心血来潮,我认为从22年开始到如今的AI大爆发全是因为一篇来自2017年的一篇解决翻译问题的论文Attention is all you nee…
-
**Why is this needed**:
Small data variations ("noise") appear amplified and get more visual attention than they should compared to larger important signals. For example, these red circled items a…
-
### Additional Information
I'm running over 30k batch jobs at scale referencing the same S3 source(one 900 KB object) mounted on local disk inside a docker container.
The jobs are producing ``…
-
### VA Notify: Important Scheduling Information
As we approach the end of the year, please note that VA Notify will ***not*** be starting any new notification intakes after November 15th. Our focus w…
-
Why are there no attention masks in DIT and U-Net?
DIT directly removes the attention masks, including both self and cross attention.
In U-Net, the mask is applied by multiplying the keys (k) and …
-
whenever context is passed to a block like in:
h, res_samples = downsample_block(hidden_states=h, temb=emb, context=context)
the forward function uses "del context"
so it is not really impleme…
-
**What happened?**
查看项目列表报错了
![16b4947265ec30733eb2f5cbfa2338c](https://github.com/user-attachments/assets/96510697-975a-43c5-95dc-7d359e93deef)
**What did you expect to happen?**
**Ho…