issues
search
fkodom
/
yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)
MIT License
100
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
An initiallization issue
#27
leor-c
opened
6 months ago
1
How do I make a PR?
#26
yunusskeete
opened
7 months ago
0
Bug fix: decay mask for bf16, bf32
#25
fkodom
closed
12 months ago
0
Some issues regarding _build_decay_mask.
#24
Doraemonzzz
closed
12 months ago
3
How to train with long sequences using chunkwise feature of RetNet?
#23
calliope-pro
closed
11 months ago
8
a more efficient computation of the state in the chunkwise formulation
#22
leor-c
closed
1 year ago
0
Slightly more efficient / cleaner implementation of the chunkwise relative pos. enc.
#21
leor-c
closed
1 year ago
1
Performance Tuning
#20
fkodom
closed
1 year ago
0
CPU benchmark support
#19
fkodom
closed
1 year ago
0
Benchmark_inference
#18
erlebach
closed
1 year ago
3
Running benchmark_inference on the CPU
#17
erlebach
closed
1 year ago
1
float32 -> 32-true
#16
fkodom
closed
1 year ago
0
Bug Fix: float32 -> 32-true
#15
fkodom
closed
1 year ago
0
No [tool.poetry] section in pyproject.toml
#14
erlebach
closed
1 year ago
3
Invalid precision when running train_project_gutenberg
#13
erlebach
closed
1 year ago
2
Change in how input projections are implemented. seem to converge faster
#12
draguve
opened
1 year ago
1
Fixed issue where the dimensions of the Group norm seems to be incorrect
#11
draguve
closed
1 year ago
2
ModelCheckpoint() argument after ** must be a mapping, not ModelCheckpoint
#10
aifartist
opened
1 year ago
0
Have you ever tried Retnet for vision tasks?
#9
cnyvfang
closed
1 year ago
4
How's this RetNet useful when throughput is actually lower?
#8
achen46
closed
1 year ago
2
bug fix: change F.relu to F.silu
#7
Dongyeongkim
closed
1 year ago
0
About activation function
#6
Dongyeongkim
closed
1 year ago
2
Example training script
#5
fkodom
closed
1 year ago
0
Chunkwise Formulation
#4
fkodom
closed
1 year ago
0
Changelog of official implementation
#3
donglixp
closed
1 year ago
1
Update README.md
#2
Amshaker
closed
1 year ago
0
Throughput measurements of parallel and recurrence methods
#1
Amshaker
closed
1 year ago
3