Open jim-11 opened 2 years ago
What is the meaning of the "tradeoff" parameter (=0.5) mentioned in the paper? Is it the "weight decay" of Adam optimizer?
What is the meaning of the "tradeoff" parameter (=0.5) mentioned in the paper? Is it the "weight decay" of Adam optimizer?