fkodom dilated-attention-pytorch issues - Githubissues

fkodom / dilated-attention-pytorch

(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)

MIT License

50 stars 9 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

ZeroDivisionError: integer division or modulo by zero

#7 younesselbrag opened 7 months ago
1
Backward pass

#6 Coluding opened 12 months ago
3
Q: Attention Calculation

#5 mohamedelbahnasawi opened 1 year ago
5
Training on yet-another-retnet script

#4 Akbarable opened 1 year ago
3
Benchmarking the MultiheadDilatedAttention Class

#3 MHarris021 closed 1 year ago
2
Running Time and Other Questions

#2 MHarris021 closed 1 year ago
10
Training

#1 Akbarable closed 1 year ago
4