Open hellojinwoo opened 5 years ago
Hello Ranjan, Nice write-up. Thanks for the article. I was searching for a architecture in such as scenario. I do have a small clarification if you could help me with.
An ideal threshold is one where the precision and recall are highest together. That means, the point of their intersection. If it is hard to identify that from this plot, I will look at the array of precision and recall.
Can you answer my other questions as well please...?
@hellojinwoo Yes, I am at the moment drafting a post that will answer your questions (at least some of them). Your questions are really good and require a detailed explanation. I also identified a few issues in my lstm network, that I will correct and mention. Please look for my message with the post. I will reply to you as soon as I post it (sometime before the end of this week).
@hellojinwoo Please look at this post, https://towardsdatascience.com/step-by-step-understanding-lstm-autoencoder-layers-ffab055b6352 It should answer your questions. I will be making some changes in the LSTM network I have in the LSTM Autoencoder for Extreme Rare Event Classification in Keras, e.g. the size of the first layer and update the post/github. Please let me know if you have questions.
@hellojinwoo Please look at this post, https://towardsdatascience.com/step-by-step-understanding-lstm-autoencoder-layers-ffab055b6352 It should answer your questions. I will be making some changes in the LSTM network I have in the LSTM Autoencoder for Extreme Rare Event Classification in Keras, e.g. the size of the first layer and update the post/github. Please let me know if you have questions.
Hi @cran2367 this is such a good post.I just wanted to know if you would update the LSTM structure soon?
Thank you, @sudhanvasbhat1993. I will be making the next post explaining how to optimize a Dense Autoencoder. Thereafter, I will be making a post on LSTM autoencoder tuning. But the next LSTM post may take a few weeks.
Hi, LSTM Autoencoder for Extreme Rare Event Classification in Keras was a great article. I applied the same on a vehicle predictive maintenance data set. I have a couple of questions, if you could kindly answer those, The following code gives you the prediction classification for each row as 0 or 1 (correct me if I am wrong) in your script, pred_y = [1 if e > threshold_fixed else 0 for e in error_df.Reconstruction_error.values] when for production I apply the pre-processing steps on a test set without y value to get the prediction , I do the temporalize and the scaling and then do the model.predict and finally get pred_y. But when I try to attach the pred_y to my original df as a predicted column. It is giving a length error, as the length of the original df in my case is 1020 and length of pred_y is 1009.
Can you please guide me to where i am going wrong and what can be done to resolve this issue.
Thanks a lot in advance.
Hello, Mr. Ranjan. Thanks for your great article LSTM Autoencoder for Extreme Rare Event Classification in Keras and code on the github. While reading your code, however, I came up with 3 questions.
I decided to ask you questions here rather than on medium because I can upload pictures and quote codes more accurately here. Hope you are okay with this.
Q1. Why ‘return_sequences=True’ for all the LSTM layers?
Back up explanations
< Figure 1. seq2seq model : Encoding - Decoding model >
In the encoding stage, what a model needs to do is making a fixed-length vector(a latent vector) which contains all the information and time-wise relationships of the input sequence. In decoding step, a model’s goal is to create an output that is as close as possible to the original input.
So my guess is that in the encoding stage, we do not need outputs as in the figure 1, as the autoencoder model's only goal is to make a hidden latent vector well. The little MSE the output created from the latent vector in the decoding stage has with the input data, the better the latent vector is.
Doesn’t it mean that we can make ‘return_sequences = False’, which does not print out the outputs in the encoding stage?
Q2. What would be the first hidden state (h0, c0) for the decoding stage?
Back up explanations
timesteps
as inlstm_autoencoder.add(RepeatVector(timesteps))
This means that the latent vector would be fed to the decoder as an input in the decoding stage. Below is the code snippet.If latent vectors are used as inputs in the decoding stage, what would be used for inital hidden state (h0, c0) ? In the seq2seq model (figure 1) mentioned above, the latent vector is used as initial hidden state (h0, c0) in the decoding stage. The input in the decoding stage would be a sentence that needs to be translated, for example from English to French.
So I am curious to know what would be used as an initial hidden state cell (h0, c0) in your code!
Q3. Why output unit size increases from 5 to 16, in the encoding stage?
Back up explanations
lstm_autoencoder.summary()
we can see that the output unit increases from 5 (in the layer 'lstm_16') to 16 (in the layer 'lstm_17' )< Figure 2. summary of LSTM - Autoencoder model >
Since the output of previous LSTM layer is an input for the next LSTM layer, I think the output size is equivalent to hidden state size.
If the hidden layer's size is greater than the number of inputs, the model can learn just an 'identity function' which is not desirable. (Source : [What is the intuition behind the sparsity parameter in sparse autoencoders?])(https://stats.stackexchange.com/questions/149478/what-is-the-intuition-behind-the-sparsity-parameter-in-sparse-autoencoders)
Layer 'lstm_16' is only 5-size long while the next layer 'lstm_17' is 16-size long. So I think the lstm_17 would just copy (acting like an 'identity matrix') the last_16, which makes the layer lstm_17 undesirable.
I am curious to know why the output size (hidden_layer size) increases rather than decreases!
Q4. How smaller does the input data size get reduced in the latent vector?
Back up explanations
Thanks for this nice post again.