-
**Describe the bug**
In trying DPO trainer example getting a bug with batch size and sharding , may be shard axis are not properly set or could be jax error as well , system used is V3 -32 , 4 hosts
…
-
### Report of performance regression
I found the attention (flashattn.py) computation time increased 1.7x after upgrade vllm 0.6.0 to 0.6.3.
| | v0.6.0 | v0.6.3 |
| :----: | :----: | :----: |
…
-
### Printer model
MK4
### Firmware version
6.1.3
### Upgrades and modifications
_No response_
### Printing from...
PrusaConnect
### Describe the bug
The API's /printer, /job, /v1/job all repo…
rtdog updated
3 weeks ago
-
As title
-
### System Info
On main
### Who can help?
@zucchini-nlp @gante
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially su…
-
This is printed when I call `functional.scaled_dot_product_attention`:
> [W914 13:25:36.000000000 sdp_utils.cpp:555] Warning: 1Torch was not compiled with flash attention. (function operator ())
…
-
I used to run this pipeline fine, but recently after a few updates and coming back to this exact workflow, I realized there are new issues, can anyone help ? thanks
# ComfyUI Error Report
## Err…
-
在linear_focus_attention这部分,为什么不对v值进行phi_qs = (F.relu(qs) + 1e-6) / (self.norm_scale.abs() + 1e-6)类似的操作呢?因为我看到论文里公式(15)对Q_s、K_s和V_s都应用了Phi函数
-
Hi, thanks for your contribution of the projects.
My question is, how can i find the attention map of the predicted image?
-
when running with `attention = True` the last match has the wrong values of latitude:
min and max values of the latitude for that particular batch:
```
batch 10 = 135.0 - 146.25
batch 11 = 148.5…