torchscale Search Results

105 results
for torchscale

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/torchscale #62

Question about the recurrent forward of MultiScaleRetention

In the multiscale-retention`s recurrent forward, it looks like the incremental state is not being updated(not returned)[1]. [1]https://github.com/microsoft/torchscale/blob/258eda33083f6361e7305f2a5…

LEECHOONGHO updated 1 year ago
2
syncdoth/RetNet #10

Changelog of official implementation

Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .

donglixp updated 9 months ago
5
microsoft/torchscale #63

Could you please explain the reason behind defining TEMPERAT…

As the title says, there was only one result.

Ruiyuan-Zhang updated 1 year ago
1
microsoft/torchscale #58

Question about is_first_step and Retnet

In the code when `is_first_step` is `True` then activate_recurrent is set to `False` here: https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L362 I was wonderin…

tdomhan updated 1 year ago
2
microsoft/torchscale #56

"sentencepiece.bpe.model" and "dict.txt" in page below seem …

https://github.com/microsoft/torchscale/blob/main/examples/fairseq/README.md#example-bert-pretraining

HuXinjing updated 1 year ago
2
Jamie-Stirling/RetNet #11

RetNet Officially Released

you may want to know https://github.com/microsoft/torchscale/commit/bf65397b26469ac9c24d83a9b779b285c1ec640b

tiendung updated 1 year ago
1
microsoft/torchscale #57

Retnet parameter dimension

I wonder why we need twice dimensions for $\mathbf{W}_V$

allanj updated 1 year ago
2
kyegomez/LongNet #5

Any demo python I can play with?

Hi, I have installed LongNet in my Ubuntu with 4090. (Is it enough to run LongNet?) but when I type python example.py, there is error... I have tried "pip install torchscale", no help, same error…

AK51 updated 1 year ago
4
Jamie-Stirling/RetNet #10

Chunkwise retention giving different output

The implementation of chunkwise retention paradigm on the [chunkwise-real](/Jamie-Stirling/RetNet/tree/chunkwise-real) branch gives different outputs to the other two paradigms. It appears there ma…

Jamie-Stirling updated 1 year ago
4
microsoft/torchscale #74

BEiT3 Vision-Language Expert question

Hello :smile: The BeIT3 paper mentions that Vision-language experts are employed in the top three Multiway Transformer layers. However, by taking a look the MultiwayNetwork implementation, I find …

andreapdr updated 11 months ago
4

上一页 1...4 5 6 7 8 9 10...11 下一页

105 results for torchscale

105 results
for torchscale