retnet Search Results - Githubissues

160 results
for retnet

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/torchscale #53

Is there some example of the paper? e.g., compare of the inf…

Hi, Thank you for your great work! We are intersted in the ability of RetNet. However, when we look through this repository, we can't find the code correspond to the paper's experiments. For example…

LiZeng001 updated 1 year ago
1
ggerganov/ggml #441

Support for xPos positional embedding

[xPos](https://arxiv.org/pdf/2212.10554.pdf) is an improved version of the original RoPE from the [RoFormer](https://arxiv.org/pdf/2104.09864.pdf) paper (i.e. a modification of ggml_rope with !is_neox…

jploski updated 1 year ago
2
Jamie-Stirling/RetNet #4

Training is slow and some errors (perhaps)

### Thank you for reproducing retnet! #### However, when I actually run the code, I find that the training is slow, 5-6 times slower for the same task compared to transformer (transformer uses half…

Zth9730 updated 1 year ago
6
syncdoth/RetNet #3

Training using HF Transformers

Is it possible to train a RetNet using Transformers?

nebulatgs updated 1 year ago
1
syncdoth/RetNet #15

passing attention_mask doesn't work for recurrent

For example: prompts = ["My dog is cute.", "My cat is very cute.", "Both my cat and dog are very cute."] tokenizer.pad_token_id = tokenizer.eos_token_id inputs = tokenizer(prompts, return_tensors…

infiniteperplexity updated 11 months ago
2
syncdoth/RetNet #4

Errors when running your examples

I tried to run your examples on this page: https://github.com/syncdoth/RetNet running of the first example, I got: File "RetNet/retnet/modeling_retnet.py", line 114, in recurrent_retention cu…

houghtonweihu updated 1 year ago
2
microsoft/torchscale #40

Inconsist recurrent and parallel results for RetNet

It seems the recurrent and parallel forward results are quite inconsistent for multiscale retention in the RetNet code. By debugging around for a while, it seems these three lines are quite weird. …

YirunKCL updated 1 year ago
4
microsoft/torchscale #48

Multi-Scale Retention: Why include position embeddings expl…

My question is about the [RetNet paper](https://arxiv.org/pdf/2307.08621v3.pdf), which leads to the implementation here... Why include the positional embedding updates directly in the multi-scale r…

fkodom updated 1 year ago
3
veya2ztn/fast_retention #5

Does this implementation also support Multiscale Retention?

Hey, Does the implementation support Multiscale Retention in parallel mode? I did see that multiple heads are a input hyper parameter but am not able to understand if MSR is completely implemented? T…

Shreyas-Dongre updated 10 months ago
4
syncdoth/RetNet #13

Can you provide a LICENCE file

I would like to use this repo for my job. I cannot do so until you add a license to the repo. Can you please do so soon?

Shubhankar-Aidetic updated 1 year ago
2

上一页 1...10 11 12 13 14 15 16...16 下一页

160 results for retnet

160 results
for retnet