-
![image](https://github.com/microsoft/unilm/assets/70521515/45853cbb-d2eb-4f0c-b541-900e93408680)
![image](https://github.com/microsoft/unilm/assets/70521515/f8db8f08-e776-4b70-aabd-e30beede72bd)
…
-
Hello, I have reviewed some of the code and did not use an attention mask. It's retnet. Don't you need to cover up the pad ID? Or does the pad ID have no impact on the previous sequence?
-
`kv = prev_kv * (1 - 1 / scale).view(self.num_heads, 1, 1) + kv / scale.view(self.num_heads, 1, 1)` line 108 in the multiscale_retention.py
should be
`kv = prev_kv * (prev_scale.sqrt() * decay / sca…
-
Hello!
I was reading up on [LongNet](https://arxiv.org/abs/2307.02486) when I wanted to have a glance at the code. It directed me to this repository, which does not seen to have any reference of `L…
-
```
[2023-08-08 03:29:00,661] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
`text_config_dict` is provided which will be used to initialize `CLIPTextC…
-
In 6.6 section of the paper
> Firstly, while our method exhibits ro-
bustness to random video shapes, one common drawback of
Transformers-based methods [25, 53, 23] is their high mem-
ory requireme…
-
From the roadmap, "KerasNLP is focused on modular and reusable building blocks". Having tried to implement some causal generative models I've found this not to be the case. The low level blocks are gr…
jackd updated
11 months ago
-
Hi Mark, please allow me to give you the praise that you deserve before asking for support while I try to join the TRUE internet, the RetNet :-)
I stumbled upon your work about a week ago, realizin…
-
请问一下开发者大大下一个版本能够增加迁移学习方面的灵活性吗?因为现在就几个参数,在做小样本迁移学习的时候感觉有点回天乏术(〒▽〒)
* 说明:
1. pretrain\_weights的路径为COCO数据集上开源的faster RCNN模型链接,完整模型链接可参考[MODEL_ZOO](https://github.com/PaddlePaddle/PaddleDetectio…
-
I am trying to understand the source code of Retinanet. While I can't find the source code of Retinanet's loss function.
![Uploading 2018-03-09 10-59-52屏幕截图.png…]()
### def add_fpn_retinanet_losses(…