-
Hi,
is there a pre-trained checkpoint available for RetNet? It is not mentioned in the [README](https://github.com/microsoft/unilm/blob/master/retnet/README.md).
Thanks!
macsz updated
5 months ago
-
### Model description
RetNet / Retentive Networks is a new model *archetype* released by microsoft; the research paper is [here](https://arxiv.org/pdf/2307.08621.pdf). As of now, there is *one* model…
-
Hello,
I have followed the training configuration introduced here (https://github.com/microsoft/torchscale/issues/52) with retnet_medium architecture. I have some questions that I would appreciate …
-
Hello Frank!
I love what you have created, and am having a great time going through and parsing through your implementation of the paper. It appears you have nailed the dilated attention calculatio…
-
I'm a little confused of what retnet does in practice. Because in the formula ` Rentention(X) = (Q @ K.T * D) @ V`, if the *decay* is 1, the mathematical derivation of proving the equivalence between …
-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
without modifying much of the config file, during training, the loss suddenly go to NaN, no clue what's going on.
MODEL:
TYPE: retinanet
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLAS…
-
In the CSPNeXt model, I only found tiny and s model checkpoints, but not m, l and x model checkpoints. Please help me take a look
-
Hello,
I followed the blog post https://zenn.dev/selllous/articles/retnet_tutorial shared in #52 in order to train RetNet, and it seems to work well for small models (< 3B).
But I am unable to …
-
As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…