gradient-centralization Search Results

openai/gpt-3 #2

Improve your state of the art by using best activation funct…

You could increase GPT 3 accuracy by using Ranger, which combine state of the art optimizers + gradient centralization https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer You seem to be usi…

LifeIsStrange updated 4 years ago

seraphlabs-ca/SentenceMIM-demo #6

Improve your state of the art by using best meta optimizer a…

You could increase SMIM accuracy by using Ranger, which combine state of the art optimizers + gradient centralization https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer Hortogonally, you …

LifeIsStrange updated 4 years ago

GilesStrong/lumin #61

Add SOTA optimisers

There was a big kerfuffle in 2019 about some new optimisers: Regularised Adam ([Liu et al., 2019](https://arxiv.org/abs/1908.03265)), Look Ahead ([Zhang, Lucas, Hinton, & Ba, 2019](https://arxiv.org/a…

GilesStrong updated 3 years ago

dirkneuhaeuser/preposition-sense-disambiguation #1

Could you improve the state of the art once again?

@dirkneuhaeuser Thanks for making the world a better place, your classifier is extremely helpful for natural language understanding. Unfortunately, 91% accuracy is still not really great for widespre…

LifeIsStrange updated 1 year ago

lessw2020/Ranger-Deep-Learning-Optimizer #13

Did you try to fine-tune transformers LM with Ranger?

Recent transformers architectures are very famous in NLP: BERT, GPT-2, RoBERTa, XLNET. Did you try to fine-tune them on some NLP task? If so, what was the best Ranger hyper-parameters and learning rat…

avostryakov updated 1 year ago

google-research/bert #810

Experiment using RAdam optimizer

Instead of adam. https://arxiv.org/pdf/1908.03265v1 Lookahead merits to be tried too https://arxiv.org/pdf/1907.08610v1.pdf Maybe it can be used on top of RAdam.

LifeIsStrange updated 4 years ago

yzhangcs/crfpar #1

What's next for the state of the art?

Firstly I would like to thank you for this fantastic work! I am not an expert, I am more of a user of dependency parsing than a researcher but I NEED (I try to build true semantic parsing) *accurat…

LifeIsStrange updated 4 years ago

Yonghongwei/Gradient-Centralization #8

关于语义分割的问题

Hi, @Yonghongwei 在实例分割里面是有ＦＣ层作为分类，所以应该使用`Adam_GC`，但是我使用在语义分割模型中，是没有ＦＣ层的，所以我应该使用`Adam_GCC`，我在语义分割模型里面加了一些 Attention模块后，里面带有一些`nn.Linear()`层，我现在应该使用`_GCC` or `_GC`？感谢回答！

GewelsJI updated 4 years ago

WongKinYiu/yolov7 #40

SOTA claims vs leaderboards mismalignment

@WongKinYiu @AlexeyAB Hi friendly pings > YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all k…

LifeIsStrange updated 1 year ago

JRC1995/DemonRangerOptimizer #2

Instability with HyperProp

I tried it in two tasks, but got nans during training, any suggestions?

hadaev8 updated 4 years ago

58 results for gradient-centralization

58 results
for gradient-centralization