fkodom yet-another-retnet issues

fkodom / yet-another-retnet

A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)

MIT License

100 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

An initiallization issue

#27 leor-c opened 6 months ago
1
How do I make a PR?

#26 yunusskeete opened 7 months ago
0
Bug fix: decay mask for bf16, bf32

#25 fkodom closed 12 months ago
0
Some issues regarding _build_decay_mask.

#24 Doraemonzzz closed 12 months ago
3
How to train with long sequences using chunkwise feature of RetNet?

#23 calliope-pro closed 11 months ago
8
a more efficient computation of the state in the chunkwise formulation

#22 leor-c closed 1 year ago
0
Slightly more efficient / cleaner implementation of the chunkwise relative pos. enc.

#21 leor-c closed 1 year ago
1
Performance Tuning

#20 fkodom closed 1 year ago
0
CPU benchmark support

#19 fkodom closed 1 year ago
0
Benchmark_inference

#18 erlebach closed 1 year ago
3
Running benchmark_inference on the CPU

#17 erlebach closed 1 year ago
1
float32 -> 32-true

#16 fkodom closed 1 year ago
0
Bug Fix: float32 -> 32-true

#15 fkodom closed 1 year ago
0
No [tool.poetry] section in pyproject.toml

#14 erlebach closed 1 year ago
3
Invalid precision when running train_project_gutenberg

#13 erlebach closed 1 year ago
2
Change in how input projections are implemented. seem to converge faster

#12 draguve opened 1 year ago
1
Fixed issue where the dimensions of the Group norm seems to be incorrect

#11 draguve closed 1 year ago
2
ModelCheckpoint() argument after ** must be a mapping, not ModelCheckpoint

#10 aifartist opened 1 year ago
0
Have you ever tried Retnet for vision tasks?

#9 cnyvfang closed 1 year ago
4
How's this RetNet useful when throughput is actually lower?

#8 achen46 closed 1 year ago
2
bug fix: change F.relu to F.silu

#7 Dongyeongkim closed 1 year ago
0
About activation function

#6 Dongyeongkim closed 1 year ago
2
Example training script

#5 fkodom closed 1 year ago
0
Chunkwise Formulation

#4 fkodom closed 1 year ago
0
Changelog of official implementation

#3 donglixp closed 1 year ago
1
Update README.md

#2 Amshaker closed 1 year ago
0
Throughput measurements of parallel and recurrence methods

#1 Amshaker closed 1 year ago
3