Closed Priahi closed 3 years ago
Hey again,
Just wanted to follow up. If my question is unclear or needs more description I can even hop on a call with you to solve this if that helps :)
Also is there a specific code to reproduce Figure 1,3,9, or 10?
Hi @Priahi ,
Thank you for your interest in our work!
We are calculating empirical NTK here. That means we need to sample both network initializations and input samples. This randomness makes each NTK calculation not the same. However, as shown in our Figure 1, the general trend is that good architectures have smaller NTK condition numbers.
Hope that helps!
Hi there,
I am trying to reproduce the NTK and LRC functions that you have in your code and when I run the NTK for 3 (or 5) repeated runs with the same settings and input model, I get vastly different results, ie:
I would love to get a better sense of what the NTK actually does and how we can get consistent results.
Also do we need to initialize with kaiming? what is the point of this initialization and is there an alternative (ie. xavier, zero, none).