VITA-Group / TENAS

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang
MIT License
167 stars 31 forks source link

Running NTK/LRC multiple times gives very inconsistent results #8

Closed Priahi closed 3 years ago

Priahi commented 3 years ago

Hi there,

I am trying to reproduce the NTK and LRC functions that you have in your code and when I run the NTK for 3 (or 5) repeated runs with the same settings and input model, I get vastly different results, ie:

ntk_original, ntk
896.1322631835938 828.4542236328125
1274.0692138671875 1108.636962890625
890.8836059570312 1008.2345581054688

I would love to get a better sense of what the NTK actually does and how we can get consistent results.

Also do we need to initialize with kaiming? what is the point of this initialization and is there an alternative (ie. xavier, zero, none).

Priahi commented 3 years ago

Hey again,

Just wanted to follow up. If my question is unclear or needs more description I can even hop on a call with you to solve this if that helps :)

Also is there a specific code to reproduce Figure 1,3,9, or 10?

chenwydj commented 3 years ago

Hi @Priahi ,

Thank you for your interest in our work!

We are calculating empirical NTK here. That means we need to sample both network initializations and input samples. This randomness makes each NTK calculation not the same. However, as shown in our Figure 1, the general trend is that good architectures have smaller NTK condition numbers.

Hope that helps!