7fantasysz / MSCRED

Multi-Scale Convolutional Recurrent Encoder-Decoder
134 stars 53 forks source link

Reconstructing the data thats already in the input? #15

Open y-he2 opened 2 years ago

y-he2 commented 2 years ago

Hi,

Firstly very nice paper and I quite like the idea, really like to apply it to other areas. Thus Im hoping that you guys are still monitoring this repo and open for discussion.

Although I have a pretty straight forward question about the way the model was used and trained: In Figure-2 in the paper and in the code, more specifically: loss = tf.reduce_mean(tf.square(data_input[-1] - deconv_out))

It seems that you guys are using the tensor in the last time step of the input as the model's output tensor? Maybe I have missed something obvious, but doesn't that simply imply that the inputs contains complete information of the output, i.e. the model can directly "see" the output in the input? Which means that by "selecting the last tensor in input" (like by setting weights for those input images to 1 and rest 0), we get a perfect estimator?

So my point is when reconstruct something shouldnt the input contain a very lossy or at least an "incomplete info version of the output", instead of containing complete information of what its suppose to reconstruct?

Im doing experiments with random walks on my own implementation of the network, and by using the last step of input as the model's output, I was still able to get very small losses ("reconstructed perfectly"). So Im suspecting that its exactly what the model is doing, i.e. by selecting one step of input as output.

In that case, my guess about why it still worked is, by "half training the model" the trainer were able to adjust the weights for the most common sample patterns, but the learning rate is not fast enough to make the model an simple "input selecting model" yet. However if you would have let the model training to converge, then this ability would be lost since the model will end up "selecting" input from inputs.

yasirroni commented 2 years ago

inputs contains complete information of the output

It seems that the paper might be a scam. I've found some repo complaining some kind of this problem and the author pretty much neglect any confrontation.

This repo is under development, now I can't reproduce results presented in the paper. 1

This is my most starred implementation on GitHub. I wrote this in a couple hours when normally I work 36+ hours straight on my repositories. While the model is accurate to the end, the model does not successfully do anything and the paper's results are bullshit. 2

y-he2 commented 2 years ago

I have seem the comments questioning the content of the paper, however I dont really think the paper content itself is in any form of scam, and the comments questioning didnt propivde any hard technical analysis on the statements other than "my result shows different". In fact Im almost sure that its doing what it suppose to do: anomaly detection.

Note the purpose difference between anomaly detection and prediction of signals here. By my understanding, depend on the purpose, the model could act as a filter that outputs the anomaly. Which is in this case the difference between the model output of next step and the actual next step. Now Im not sure whether its by design or mistake or simply my mistake, that the true output of the next state is included in each training step, it could still make sense if the purpose of the model is to "adapt" to a smoothed state function of each time step. Think of the Kalman filter, where we are constantly feeding the "almost last" signal to its estimator and adjust to the errors.

My question was more like to confirm whether my understanding of the purpose, of training the model with the true output it suppose to predict in each time step, was correct. My guess would be exactly as the Kalman filter case that they want to retrain the model constantly in order for it to adapt to the state function of couple of previous time step, but since the model could never do it perfectly (given limited amount of train steps), a large high freq change in the states (will likely not be captured by the model) will still be "smoothed" by the model, thus producing a high anomaly value (diff between model output and true output). This is possible even if u train it with the true output, although it may sound little weird. But I think this could be fixed and the model will still work if we simply remove the "next step true output" from training.

I think the model itself is still a very interesting one, just remove as training data, anything that may be contained in the true output, and the model could serve as a pretty good multi-resolution "image sequence model". I have implemented a Keras version and did some simplifications, I could upload the code if there are needs.

Although if we contain as input data an average of the time steps that is so close to the actual true next time step, it raise some question like how one should train the model, if the error could eventually converge to zero, at what error should the training be stopped?

yasirroni commented 2 years ago

I have implemented a Keras version and did some simplifications, I could upload the code if there are needs.

Can I look at your implementation? You might also write publication regarding your implementation if there is a novelty there. My collogue got a trouble two years ago since his thesis depends on this method, but failed to works.

Also, from the abstract of the paper, we can see that it should work as unsupervised learning, meaning no anomaly in train data while able to predict anomaly from test data based on distance metric. We could not confirm this part on the code provided by the author. Furthermore, changing the data to external dataset with different number of timeseries is impossible (this is mostly due to our lack in understanding the model architecture and modification).

IKetchup commented 1 year ago

I am also interrested in the keras version of your code @y-he2. thanks in advance

y-he2 commented 1 year ago

I have now uploaded the Python script and a notebook used to gen/test the (simplified) model. The Attention layers were not implemented in this version of the model, since the self attention layer were not implemented in the Keras version I was using: https://github.com/y-he2/mscred_keras Since its a while ago, I cannot provide any testing, Keras version, documentation, maintenance, input/output data here at the moment. I might in the future if the need has increased. Tell me if theres anything missing, but I make no promises. Good luck.