-
Thank you for providing the code.
I have a question about Fig.5 in your paper.
In my understanding, the value of attention is obtained for each pair of (s,e,t) based on Eq.(3).
In Fig.5, you se…
-
I was wondering if there are any plans to support gradient flow back to biases that are added to the score function. For instance, if I add a scalar to the: score + b where b is coming from a learnabl…
-
-
Hey there
I'm interested in Char2Wav, thanks for your code.
Would u update it with attention mechanism?
-
## Summary
For the full Llama 3B model bringup, we want to test the main standalone blocks before running full model e2e. One of those blocks is the attention module.
## Details
For initial Llama…
-
In the tutorial, I find that the "attention" mechanism is a fake attention, since the calculated attention weights have no relationship with the encoder output vectors.
The implementation in the orig…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
First and foremost, I would like to express my appreciation for the outstanding work you have done in this field. Your insights have had a significant impact on my research, and I greatly admire your …
-
### Feature request
![image](https://github.com/user-attachments/assets/cb57d5dd-6502-4b31-a948-bb46e535fea5)
The LogitProcessor __call__ method currently has access to the input_ids and the logit…
-
See project proposal [here](https://andre-martins.github.io/pages/project-examples-for-deep-structured-learning-fall-2019.html).