AnselmC / double-descent

This repo aims to reproduce the results from the double-descent paper
2 stars 0 forks source link

Can you try it on mnist by five or three layer full connected net(FCN)? #1

Open bwnjnOEI opened 4 years ago

bwnjnOEI commented 4 years ago

Hi, I also interested in Double-Descent-Curve(DDC). I'm troubled by DDC because of different paper have own setting, such as 1 #Deep Double Descent: Where Bigger Models and More Data Hurt# 2 #Scaling description of generalization with number of parameters in deep learning# 3 #Reconciling modern machine-learning practice and the classical bias–variance trade-off# 4 #A jamming transition from under-to over-parametrization affects loss landscape and generalization# I use 'weight reuse' (extend net's width) of 3# on mnist by 5-layers-FCN, but it doesn't show me DDC. Can you exchange empirical evidence with me?

AnselmC commented 4 years ago

Hey, the purpose of this repository is to implement the original double-descent results from Belkin et al. Therefore there is no implementation for a NN with more than a single hidden layer. However, even for the single hidden layer the results were/are hard to replicate due (in large part) to SGD being highly sensitive to initialization. Belkin therefore averages several runs for their results.

Were you able to reproduce the behavior on a 2-, 3-, and 4-layer FCN? I'd suggest starting there