retnet Search Results - Githubissues

microsoft/unilm #1474

Checkpoint for RetNet

Hi, is there a pre-trained checkpoint available for RetNet? It is not mentioned in the [README](https://github.com/microsoft/unilm/blob/master/retnet/README.md). Thanks!

macsz updated 5 months ago

huggingface/transformers #25243

### Model description RetNet / Retentive Networks is a new model *archetype* released by microsoft; the research paper is [here](https://arxiv.org/pdf/2307.08621.pdf). As of now, there is *one* model…

yoinked-h updated 4 months ago

microsoft/torchscale #64

retnet traning config

Hello, I have followed the training configuration introduced here (https://github.com/microsoft/torchscale/issues/52) with retnet_medium architecture. I have some questions that I would appreciate …

hanlinxuy updated 10 months ago

fkodom/dilated-attention-pytorch #4

Training on yet-another-retnet script

Hello Frank! I love what you have created, and am having a great time going through and parsing through your implementation of the paper. It appears you have nailed the dilated attention calculatio…

Akbarable updated 1 year ago

Jamie-Stirling/RetNet #37

Is Retnet equivalent to ordinary GPT when the decay is set t…

I'm a little confused of what retnet does in practice. Because in the formula ` Rentention(X) = (Q @ K.T * D) @ V`, if the *decay* is 1, the mathematical derivation of proving the equivalence between …

xuanyaoming updated 5 months ago

useCallback/retnets #3

Changelog of official implementation

Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .

donglixp updated 1 year ago

facebookresearch/Detectron #81

retinanet loss go to NaN

without modifying much of the config file, during training, the loss suddenly go to NaN, no clue what's going on. MODEL: TYPE: retinanet CONV_BODY: FPN.add_fpn_ResNet50_conv5_body NUM_CLAS…

coldgemini updated 5 years ago

open-mmlab/mmdetection #10686

About the checkpoint problem of CSPNeXt in RetNet

In the CSPNeXt model, I only found tiny and s model checkpoints, but not m, l and x model checkpoints. Please help me take a look

cunning2017 updated 1 year ago

microsoft/torchscale #83

Training RetNet on A100 GPUs

Hello, I followed the blog post https://zenn.dev/selllous/articles/retnet_tutorial shared in #52 in order to train RetNet, and it seems to work well for small models (< 3B). But I am unable to …

Antoine-Bergerault updated 9 months ago

microsoft/torchscale #85

Introducing padding_mask to RetNet

As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…

xtwigs updated 9 months ago

160 results
for retnet