Open 9527-ly opened 1 year ago
Each layer of LSTMs contains multiple weight matrixs, such as the weight of input gate and output gate. I don't know how to deal with multiple weight matrices in one layer. After I took out the weight matrixs of LSTMs, I concatenate them to a complete matrix. Some of the code I used to test was as follows:
def analyze_model(model):
alphas = []
for l in model.layers:
print(l)
if l.get_weights():
if (isinstance(l, tensorflow.keras.layers.LSTM)):
W=l.get_weights()[1]
elif (isinstance(l, tensorflow.keras.layers.ConvLSTM2D)):
W0 = l.get_weights()[0]
W1 = l.get_weights()[1]
W = np.append(W0, W1, axis=2)
else:
W = l.get_weights()[0]
W = reshape_tensor(W)
M, N = np.min(W.shape), np.max(W.shape)
if ((N>2) and (M>2)):
Q=N/M
svd = TruncatedSVD(n_components=M-1, n_iter=7, random_state=10)
svd.fit(W)
sv = svd.singular_values_
evals_imagenet = sv*sv
alpha, D, best =fit_powerlaw(evals_imagenet)
print(alpha, D, best)
alphas.append(alpha)
return alphas
I want to know whether I have handled the weight of LSTMs correctly. Because I can perform svd. fit (Wi) on each matrix, and I can also perform svd. fit (W) on a merged matrix. Can my operation apply to other metrics, like the spectral norm and the MP soft rank. I would appreciate your help.
Adding LSTMs is more complicated. Its unclear to me if just reshaping or stacking the matrices will yield power law distributions
also, i dont have a good set of very well trained models, similar to whats on huggingface, to study how LSTMs behave
Adding LSTMs is more complicated. Its unclear to me if just reshaping or stacking the matrices will yield power law distributions
also, i dont have a good set of very well trained models, similar to whats on huggingface, to study how LSTMs behave
Thank you @charlesmartin14 . I once saw a paper named ''On Generalization Bounds of a Family of Recurrent Neural Networks''. The link to this paper is https://arxiv.org/pdf/1910.12947v2.pdf.
Can you provide me with your email address. If possible, I would like to provide you with some LSTMs models for research.
At the same time, I want to know whether there are generalization metrics applicable to LSTMs. If you have any suggestions, I would be very grateful
charlesmartin14@gmail.com charles@calculationconsulting.com
FYI, weightwatcher is not based on generalization bounds; it uses techniques from statistical mechanics
If you can send me a few models (and maybe the code used to generate them) I can see about adding them to weightwatcher. LIke a shared google drive would be good.
charlesmartin14@gmail.com charles@calculationconsulting.com
FYI, weightwatcher is not based on generalization bounds; it uses techniques from statistical mechanics
If you can send me a few models (and maybe the code used to generate them) I can see about adding them to weightwatcher. LIke a shared google drive would be good.
Thank you @charlesmartin14. In recent days, I will provide you with the tensorflow model according to your needs.
@charlesmartin14. I uploaded the file. Due to the limitation of the upload file size, I have compressed the input feature file. You can get the file from the link: https://github.com/9527-ly/lstms. You can also get it from the email I sent you.
This will take some time We have started a channel on discord to discuss
This will take some time We have started a channel on discord to discuss
Thank you very much for your attention to this work. I look forward to its realization
I noticed that the current toolkit only supports Conv2D and Dense models. But in practical application and research, we often use time series data to predict. So I want to know whether this theory is applicable to recurrent neural networks, such as LSTM and RNN. Thank you very much. In fact, I have used the toolkit to conduct some tests on LSTMs, and the results of these tests can be obtained. Since the weight of LSTMs is different from that of conventional neural network models, I do not know whether my test process and test results are correct.