issues
search
fkodom
/
dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
MIT License
50
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
ZeroDivisionError: integer division or modulo by zero
#7
younesselbrag
opened
7 months ago
1
Backward pass
#6
Coluding
opened
12 months ago
3
Q: Attention Calculation
#5
mohamedelbahnasawi
opened
1 year ago
5
Training on yet-another-retnet script
#4
Akbarable
opened
1 year ago
3
Benchmarking the MultiheadDilatedAttention Class
#3
MHarris021
closed
1 year ago
2
Running Time and Other Questions
#2
MHarris021
closed
1 year ago
10
Training
#1
Akbarable
closed
1 year ago
4