-
In the multiscale-retention`s recurrent forward, it looks like the incremental state is not being updated(not returned)[1].
[1]https://github.com/microsoft/torchscale/blob/258eda33083f6361e7305f2a5…
-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
As the title says, there was only one result.
-
In the code when `is_first_step` is `True` then activate_recurrent is set to `False` here:
https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L362
I was wonderin…
-
https://github.com/microsoft/torchscale/blob/main/examples/fairseq/README.md#example-bert-pretraining
-
you may want to know https://github.com/microsoft/torchscale/commit/bf65397b26469ac9c24d83a9b779b285c1ec640b
-
I wonder why we need twice dimensions for $\mathbf{W}_V$
-
Hi,
I have installed LongNet in my Ubuntu with 4090. (Is it enough to run LongNet?)
but when I type python example.py, there is error...
I have tried "pip install torchscale", no help, same error…
-
The implementation of chunkwise retention paradigm on the [chunkwise-real](/Jamie-Stirling/RetNet/tree/chunkwise-real) branch gives different outputs to the other two paradigms.
It appears there ma…
-
Hello :smile:
The BeIT3 paper mentions that Vision-language experts are employed in the top three Multiway Transformer layers. However, by taking a look the MultiwayNetwork implementation, I find …