retnet Search Results - Githubissues

160 results
for retnet

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Jamie-Stirling/RetNet #36

Dimensions of forward_recurrent

In MultiScaleRetention class, it is mentioned that 's_n_1s' has dimensions (batch_size, heads, head_size, head_size), while in SimpleRetention, 's_n_1' is defined as 's_n_1s[i]'. However, you mentione…

Qiu30 updated 10 months ago
5
microsoft/torchscale #73

AttributeError: 'EncoderDecoderConfig' object has no attribu…

I have get two errors: （1） from torchscale.architecture.config import EncoderDecoderConfig from torchscale.architecture.encoder_decoder import EncoderDecoder config = EncoderDecoderConfig(vocab_…

Yuki2L0ve updated 11 months ago
3
microsoft/torchscale #67

pip package does not contain RetNet

The latest release 0.2.0 is from March (see https://pypi.org/project/torchscale/#history), predating the introduction of RetNet in this repo. As such the README is misleading since it is not possible …

fabienGenhealth updated 11 months ago
2
berlino/gated_linear_attention #1

advice for small sized GLA

Thank you for this amazing work, I'm trying to include your work as a drop-in replacement of some other SSM such as Mamba and RWKV. Note that I train significantly smaller models (from 20M to 60M p…

theodorblackbird updated 9 months ago
3
Jamie-Stirling/RetNet #27

/src/retnet.py GPU

2511 if has_torch_function_variadic(input, weight, bias): 2512 return handle_torch_function( 2513 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=b…

Qiu30 updated 12 months ago
2
veya2ztn/fast_retention #4

How to use Parallel Retention?

Hey, The parallel form of Retention, it returns two values a tuple, but in your ReadMe, in one of your examples it is mentioned that parallel retention's output is just one tensor. So I am confused …

Shreyas-Dongre updated 10 months ago
2
syncdoth/RetNet #28

gradient_checkpointing=True issue in TrainerArgument

I'm using the Retnet base config with the following TrainingArguments: args = TrainingArguments( output_dir="/content/retnet-xsum", per_device_train_batch_size=1, per_device_eval_bat…

lolshuo updated 10 months ago
1
microsoft/torchscale #68

initialization of qkv

In the paper, the authors mentioned that the initialization followed DeepNet but from the code, it's kind of different. Why is there a mismatch? ``` def reset_parameters(self): nn.init.xavier_u…

XintianHan updated 9 months ago
3
microsoft/torchscale #75

About training memory

Hi, in your Retnet paper table4, the naiive transformer 1.3B model cost more gpu memory than 2.7B model, could you please explain why?

HoraceXIaoyiBao updated 10 months ago
2
microsoft/torchscale #54

RetNet : Check consistency of each forward mode

Hello authors, I'm really happy to see this great work! I have one question or request about the consistency of output from each forward mode. I have been comparing three outputs by using below s…

mmorinag127 updated 11 months ago
9

上一页 1...8 9 10 11 12 13 14...16 下一页

160 results for retnet

160 results
for retnet