adaptive-learning-rate Search Results

1000+ results
for adaptive-learning-rate

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kohya-ss/sd-scripts #1263

`fix_noise_scheduler_betas_for_zero_terminal_snr` should com…

I think the correct implementation should be like this. ```python noise_scheduler = DDPMScheduler( beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, c…

ngitnenlim updated 7 months ago
13
LiyuanLucasLiu/RAdam #8

Sensitivity wrt LR restarts

I'm observing sensitivity wrt LR restarts in a typical SGDR schedule with cosine annealing as in Loschilov & Hutter. RAdam still seems to be doing better than AdamW so far, but the jumps imply possibl…

depthwise updated 5 years ago
12
h2oai/h2o-3 #10607

Grid search overwrite_with_best_model false

Hi, this is under H2O Flow webinteface. For deep learning grid search and single model search. The "overwrite_with_best_model" is sometimes missing or true/false switches. If the button is clicked i…

exalate-issue-sync[bot] updated 1 year ago
1
ruotianluo/self-critical.pytorch #196

transformer should achieve around 1.29

Hello @ruotianluo and thanks for your code. I've seen that some papers report the results of the pure transformer applied for image captioning around 1.285 and some 1.29. For example, in the paper **…

homelifes updated 4 years ago
28
XanaduAI/QHack2021 #75

[ENTRY] Step globally, grad locally: circuit optimization wi…

### Team Name: zer0dynamics ### Project Description: Gradient ascent in function space (GRAFS) [1] is an algorithm for optimal control synthesis that leverages functional expansions of cont…

zerodynamics updated 3 years ago
2
deeplearning4j/deeplearning4j #5843

Implement AdamW (and AdamWR) optimizer

According to this blogpost: http://www.fast.ai/2018/07/02/adam-weight-decay/ and mentioned article https://arxiv.org/abs/1711.05101, Adam has problems when used with L2 regularization. If i understand…

stolsvik updated 6 years ago
14
ChrisFuscoMasters/TransformerLib #10

Research LayerNorm

**Aim** Find out what layer-norm actually does (ie. benefits, limitations) and why/how it's applied to transformers. **Plan** - [ ] [Understanding the Difficulty of Training Transformers](https:/…

CJcool06 updated 1 year ago
1
openedx/platform-roadmap #375

Proposal: [Catalog Domain] Course About page, Index Page, an…

### Abstract This proposal aims to transition the Course About, Catalog, and Index pages of the Open edX platform from legacy architecture to MFE. Historically, catalog and about pages are provided s…

GlugovGrGlib updated 4 weeks ago
13
BA-GROUP-ASSIGNMENT/Solution #1

Introduction

The Theme for the Project: “An AI Solution for Communities”. According to the father of Artificial Intelligence (AI), John McCarthy, it is “The science and engineering of making intelligent machine…

I-JOSIANE-JOHNGWA updated 1 year ago
1
barisozmen/deepaugment #24

Dropout Cause significant performance change between each tr…

Using Dropout in child_model shows great works on prevent overfitting, however it also cause the final performance on model change significantly during each training with same hyper-params. It is too …

lichuanx updated 5 years ago
3

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for adaptive-learning-rate

1000+ results
for adaptive-learning-rate