Jamie-Stirling RetNet issues

Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

MIT License

1.14k stars 98 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Is Retnet equivalent to ordinary GPT when the decay is set to 1 ?

#37 xuanyaoming opened 5 months ago
3
Dimensions of forward_recurrent

#36 Qiu30 closed 7 months ago
5
a question about xpos and D of decay mat

#35 DavideHe opened 7 months ago
2
Confusion about "the chunkwise recurrent representation of retention"

#34 CHENHUI-X opened 8 months ago
0
Can this mechanism be applied to PointCloud data ?

#33 madjid-dx opened 8 months ago
0
NO LM HEAD

#32 shnuhw closed 8 months ago
2
Fix dimension mismatch when hidden size is odd

#31 ilunye closed 8 months ago
0
How to predict use this net?

#30 GodPCWANG opened 9 months ago
0
Faster implementation of MultiScaleRetention, adds dependency on einops

#29 draguve closed 9 months ago
1
The complex theta should cancel out

#28 albertbuchard opened 9 months ago
0
/src/retnet.py GPU

#27 Qiu30 closed 9 months ago
2
Fix math problem in gamma calculation

#26 Jun-depo closed 6 months ago
1
Assistance on training a new retention network model ?

#25 risedangel opened 9 months ago
0
Passing Attention Masks

#24 leffff opened 9 months ago
3
Update retention.py

#23 leffff closed 9 months ago
1
Q, k and D device difference

#22 leffff closed 9 months ago
1
Proposed improvement/collaboration: removing the O(T^2) training cost

#21 jackd closed 9 months ago
2
Fixed typo

#20 EgoVeroConsisto opened 10 months ago
0
can retnet be applied in point cloud tasks？

#19 huiyang0613 opened 10 months ago
0
Changelog of official implementation

#18 donglixp opened 11 months ago
2
what about cross-attention

#17 aki819 opened 11 months ago
0
Error when the model is running on GPU

#16 SSamDav opened 11 months ago
1
Update src/complex/retention.py

#15 MichaelFu1998-create closed 11 months ago
1
demo example / number of parameter control vs original code

#14 thegodone opened 11 months ago
4
Chunkwise real

#13 Jamie-Stirling closed 11 months ago
0
Can you make this repo in available for package installers (pip)?

#12 gaasher opened 11 months ago
0
RetNet Officially Released

#11 tiendung closed 11 months ago
1
Chunkwise retention giving different output

#10 Jamie-Stirling closed 11 months ago
4
Minor docs fix

#9 Regenhardt closed 11 months ago
0
_get_D function very slow for long sequence

#7 ZuowenWang0000 closed 11 months ago
1
Real-valued implementation using xPos

#6 Jamie-Stirling closed 11 months ago
0
Real-valued implementation using xPos

#5 Jamie-Stirling closed 11 months ago
0
Training is slow and some errors (perhaps)

#4 Zth9730 closed 11 months ago
6
Initial effort to add chunkwise retention paradigm

#3 Aaryanverma closed 11 months ago
1
Some Questions about Attention Mask

#2 tang-ed closed 11 months ago
3
About the complex

#1 KohakuBlueleaf closed 11 months ago
4