retnet Search Results - Githubissues

160 results
for retnet

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/torchscale #81

Question about the normalization in attention

Dear authors, Nice work! I have a few questions regarding the normalization in the implementation of the RetNet and would like to consult your ideas about them: [Here](https://github.com/microso…

Cranial-XIX updated 9 months ago
2
fkodom/yet-another-retnet #8

How's this RetNet useful when throughput is actually lower?

Thank you for your work. I did some testing with your implementation and it is robust and works pretty well ! However, for non-auto-regressive applications, the throughput is pretty much worse tha…

achen46 updated 11 months ago
2
syncdoth/RetNet #8

Question about verifying the Inference Latency

Hi, Thank you for your great work! When I use your example code to compare the Inference Latency with Transformer-based LLM, the result is not as expected in the paper (15.6X). Could you please …

LiZeng001 updated 11 months ago
3
huggingface/diffusers #4209

Implement using RetNet instead of transformers

New RetNet is introduced i think it should be implemented in diffusers . I think you already knew about it but it may be helpful for faster inference. ![Screenshot 2023-07-22 193832](https://gith…

tronizx updated 1 year ago
3
syncdoth/RetNet #17

Can't Resume Training from Checkpoint

Hey, So I'm training the model using Huggingface Trainer. If the trainer exits for any reason and I resume from checkpoint, the model no longer learns anymore. I'm using the trian.py as is and exe…

infosechoudini updated 11 months ago
1
fkodom/yet-another-retnet #24

Some issues regarding _build_decay_mask.

Thank you for your implementation, but I have encountered a bug when using the code. There is a major problem in the function `_build_decay_mask` where the last element of `decay_gammas` is set to 1. …

Doraemonzzz updated 10 months ago
3
microsoft/torchscale #57

Retnet parameter dimension

I wonder why we need twice dimensions for $\mathbf{W}_V$

allanj updated 1 year ago
2
microsoft/torchscale #72

RuntimeError: The size of tensor a (5) must match the size o…

``` python train.py \ /home/sc0111/ai/torchscale/wikitext-103/wikitextdone \ --num-workers 4 \ --arch retnet_base \ --task language_modeling \ --optimizer adam --adam-betas "(0.9, 0.98)" \ --ma…

codinglover0111 updated 10 months ago
3
microsoft/torchscale #55

Retnet training is slow

Hi, when I use retnet's parallel mode to train, it's very slow, I observe the gou memory usage, it's very small, what's going on? Thank you! ```[tasklist] ### Tasks ```

Zth9730 updated 1 year ago
2
Jamie-Stirling/RetNet #11

RetNet Officially Released

you may want to know https://github.com/microsoft/torchscale/commit/bf65397b26469ac9c24d83a9b779b285c1ec640b

tiendung updated 1 year ago
1

上一页 1...9 10 11 12 13 14 15...16 下一页

160 results for retnet

160 results
for retnet