-
Hi! Congrats on the clean RetNet code. I'm experimenting with the torchscale codebase and happened to find your repo with a link to a checkpoint in HF.
I noticed it is now gone, do you have plans t…
-
Curious where to begin research for unlimited context length? Any direction appreciated.
-
Thank you for sharing the implementation of the attractive work!
When training DiJiang with long inputs (>5000), the outputs were NaN. This was due to an overflow, as D2 was defined as -n powers of…
-
Thank you for the great implementation !
The specification of torch.compile is enabled by adding the statement "@torch.compile" just before every forward() function in modeling_retnet.py.
![image]…
-
In retnet-3b/config.json, according to the experimental settings of the paper
https://arxiv.org/pdf/2307.08621.pdf , set decoder_ffn_embed_dim and decoder_value_embed_dim to twice the size of decode…
-
Thank you for your great work!
I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you experimented with the per…
-
In `train_model.py` we have an issue where checkpoints saved from the same epoch will likely overwrite one another. It looks like line 137 can change this behavior. ` filename="epoch_{epoch}_validatio…
-
- [x] Fails when Vignettes=TRUE
- [x] update roxygen2
- [x] remove warnings
- [x] remove commented code in geom ret
- [x] convert names to snakecase
- [x] In read_beast_retnet, move helper functi…
-
The [primary codepath](https://github.com/JoshVarty/pytorch-retinanet/issues/1) starts a number of threads that loads images from disk in minibatches.
The minibatch loader codepath is much smaller,…
-
Hi there. I want to understand how to use the RetNet to train a model with the longer context. It is not clear from available documentation how to train the model for a large context. There is no para…
pkpro updated
11 months ago