JulesBelveze / time-series-autoencoder

PyTorch Dual-Attention LSTM-Autoencoder For Multivariate Time Series
Apache License 2.0
630 stars 62 forks source link

Questions about using the model for denoising (reconstruction) #45

Open mreyessierra opened 6 months ago

mreyessierra commented 6 months ago

Hello, I am interested in using your code to remove noise from a set of time series (reconstruction). I would like to confirm the following: the series are divided (in temporal order) into the first 80% for training and the remaining 20% for testing, correct? So, if I wish to remove noise from one (or several) complete time series, how should I input the data? Regarding this, I have another question: I see that the reconstruction example applies training and evaluation in the training, how would it be to train it and then only test it (I don't see an example where only the evaluation is applied)? One last question: does the denoising process take into account the interactions between the different series, or is it carried out independently for each one? Thank you very much in advance!

JulesBelveze commented 6 months ago

Hey @mreyessierra thanks for your interest. Let me try to answer your questions 🤓

  1. Not sure to get what your problem is on that one. Once your model is trained you can use it in eval mode to reconstruct your time series. However, you won't be able to reconstruct for the first few timestamps as they are used as "history". Does that answer your question?
  2. I haven't provided code to only perform inference. But if you only want to run the evaluation process you could use the function defined here.
  3. Yeah it does leverage the global context! :)

Hope it helps!

mreyessierra commented 6 months ago

Thank you very much for the answers! Regarding the evaluation, since the model fragments the time series (for training and evaluation), I'm not very clear on which parameter should I use in train_test_split so that the model applies the reconstruction process to the complete series. On the other hand, regarding this condition: if cfg.general.do_eval and cfg.general.get("ckpt", False), how do I make sure to satisfy the part of cfg.general.get("ckpt", False)? Thank you again!"

JulesBelveze commented 5 months ago

Okay I think I got it. If you want to run the evaluation on the entire series (which is not advised as you would evaluate the model on windows it has been trained on), you would have to set the train_size parameter to 0 (dunno if scikit-learn will complain btw).

Regarding the second part of your question: to meet the condition you would have to either modify the yaml file to have a general.ckpt parameter. Or add a flag +=general.ckpt=PATH_TO_YOUR_CHECKPOINTS when running your file.

Let me know if this helps

mreyessierra commented 5 months ago

series_1 Thank you very much for responding, and I apologize for the delay in my reply. What I really wanted to do (and am trying to do) is use one dataset for training-evaluating and another just for evaluating, which is why I was wondering how to evaluate the model on the complete series. The train_test_split function indeed does not allow a zero value for train_size :) I've been working on this these days. Now, I have a question related to the results I've obtained. In the image, I send an example where the predicted series is noisier than the original, which was supposed to be cleaned... do you have any idea why this happens or in what cases this occurs? Thank you very much!