-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
# URL
- https://arxiv.org/abs/2307.08621
# Affiliations
- Yutao Sun, N/A
- Li Dong, N/A
- Shaohan Huang, N/A
- Shuming Ma, N/A
- Yuqing Xia, N/A
- Jilong Xue, N/A
- Jianyong Wang, N/A
…
-
I'd like to use SoftmaxWithLoss instead of SoftmaxFocalLoss.
`gated_prob, cls_focal_loss = model.net.SoftmaxWithLoss(
[cls_lvl_logits, 'retnet_cls_labels_' + suffix],
['retnet_prob_{}'.fo…
-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
Hi, thanks for sharing your code! I'm also implementing an encoder-decoder model with a similar structure to yours.
However, I'm confused about why the encoder_output isn't used in the RetNet deco…
-
请问能讲讲RetNet吗?似乎很有潜力的样子。
-
Thanks for a great work!
I am just wondering what would happen if we had higher D. This is because the RetNet configs that you can obtain from the torchscale (and also mine) have typically `D=128` …
-
There has been a completed merge of mamba model support over at Ilama.ccp, would it be possible to implement these into Ollama as well?
Merged PR: https://github.com/ggerganov/llama.cpp/pull/5328
…
-
Great work, thank you! I am encountering the following issue: When I follow your retnet_machine_translation.ipynb to train retnet on Ubuntu with CUDA, I achieve the same quality as you reported. Howev…