Whether this set of analysis theory is applicable to recurrent neural networks

CalculatedContent / WeightWatcher

The WeightWatcher tool for predicting the accuracy of Deep Neural Networks

Apache License 2.0

1.46k stars 124 forks source link

Whether this set of analysis theory is applicable to recurrent neural networks #145

Open 9527-ly opened 1 year ago

9527-ly commented 1 year ago

I noticed that the current toolkit only supports Conv2D and Dense models. But in practical application and research, we often use time series data to predict. So I want to know whether this theory is applicable to recurrent neural networks, such as LSTM and RNN. Thank you very much. In fact, I have used the toolkit to conduct some tests on LSTMs, and the results of these tests can be obtained. Since the weight of LSTMs is different from that of conventional neural network models, I do not know whether my test process and test results are correct.

9527-ly commented 1 year ago

Each layer of LSTMs contains multiple weight matrixs, such as the weight of input gate and output gate. I don't know how to deal with multiple weight matrices in one layer. After I took out the weight matrixs of LSTMs, I concatenate them to a complete matrix. Some of the code I used to test was as follows：

def analyze_model(model):

alphas = []
for l in model.layers:
    print(l)
    if l.get_weights():
        if (isinstance(l, tensorflow.keras.layers.LSTM)):
            W=l.get_weights()[1]
        elif (isinstance(l, tensorflow.keras.layers.ConvLSTM2D)):
            W0 = l.get_weights()[0]
            W1 = l.get_weights()[1]
            W = np.append(W0, W1, axis=2)
        else:
            W = l.get_weights()[0]
        W = reshape_tensor(W)
        M, N = np.min(W.shape), np.max(W.shape)
        if ((N>2) and (M>2)):
            Q=N/M

            svd = TruncatedSVD(n_components=M-1, n_iter=7, random_state=10)
            svd.fit(W)
            sv = svd.singular_values_
            evals_imagenet = sv*sv

            alpha, D, best =fit_powerlaw(evals_imagenet)
            print(alpha, D, best)
            alphas.append(alpha)
return alphas

9527-ly commented 1 year ago

I want to know whether I have handled the weight of LSTMs correctly. Because I can perform svd. fit (Wi) on each matrix, and I can also perform svd. fit (W) on a merged matrix. Can my operation apply to other metrics, like the spectral norm and the MP soft rank. I would appreciate your help.

charlesmartin14 commented 1 year ago

Adding LSTMs is more complicated. Its unclear to me if just reshaping or stacking the matrices will yield power law distributions

also, i dont have a good set of very well trained models, similar to whats on huggingface, to study how LSTMs behave

9527-ly commented 1 year ago

Adding LSTMs is more complicated. Its unclear to me if just reshaping or stacking the matrices will yield power law distributions

also, i dont have a good set of very well trained models, similar to whats on huggingface, to study how LSTMs behave

Thank you @charlesmartin14 . I once saw a paper named ''On Generalization Bounds of a Family of Recurrent Neural Networks''. The link to this paper is https://arxiv.org/pdf/1910.12947v2.pdf.

Can you provide me with your email address. If possible, I would like to provide you with some LSTMs models for research.

At the same time, I want to know whether there are generalization metrics applicable to LSTMs. If you have any suggestions, I would be very grateful

charlesmartin14 commented 1 year ago

charlesmartin14@gmail.com charles@calculationconsulting.com

FYI, weightwatcher is not based on generalization bounds; it uses techniques from statistical mechanics

If you can send me a few models (and maybe the code used to generate them) I can see about adding them to weightwatcher. LIke a shared google drive would be good.

9527-ly commented 1 year ago

charlesmartin14@gmail.com charles@calculationconsulting.com

FYI, weightwatcher is not based on generalization bounds; it uses techniques from statistical mechanics

If you can send me a few models (and maybe the code used to generate them) I can see about adding them to weightwatcher. LIke a shared google drive would be good.

Thank you @charlesmartin14. In recent days, I will provide you with the tensorflow model according to your needs.

9527-ly commented 1 year ago

@charlesmartin14. I uploaded the file. Due to the limitation of the upload file size, I have compressed the input feature file. You can get the file from the link: https://github.com/9527-ly/lstms. You can also get it from the email I sent you.

charlesmartin14 commented 1 year ago

This will take some time We have started a channel on discord to discuss

9527-ly commented 1 year ago

This will take some time We have started a channel on discord to discuss

Thank you very much for your attention to this work. I look forward to its realization