WenjieDu / SAITS

The official PyTorch implementation of the paper "SAITS: Self-Attention-based Imputation for Time Series". A fast and state-of-the-art (SOTA) deep-learning neural network model for efficient time-series imputation (impute multivariate incomplete time series containing NaN missing data/values with machine learning). https://arxiv.org/abs/2202.08516
https://doi.org/10.1016/j.eswa.2023.119619
MIT License
319 stars 50 forks source link

Some questions about multivariate time series Imputation. #9

Closed xuzengsong closed 1 year ago

xuzengsong commented 1 year ago

Thank you for your work,I recently read your paper SAITS: Self-Attention-based Imputation for Time Series. I am also doing the work related to multivariate time series Imputation. I have some questions, and I hope to communicate with you. 1.I recently used your method to run the data set I used. My data processing approach is first divided into training set and test set, and then build time sequence, first use the train set to train, and then use the test set test (but I know data Imputation algorithm is unsupervised algorithm and did not use the true information of the missing data, there are some people who divided the test set training set, while there are also some people didn't,There are some differences in the results of your algorithm between these two data set partitioning methods,May I ask how do you view the partitioning of data sets?)

  1. May I ask whether your algorithm will have overfitting, because the loss of back propagation is the MAE of unmissing items, not the MAE of the whole data set. I feel that with the increase of training times, it will gradually tend to be overfitting
  2. Now the stopping condition of the algorithm is to reach the specified epoch. The epoch of different data sets need to be detected,If we divide the test set and the training set, can we quit the training by judging that the missing item data MAE of the training set reaches the minimum. Thank you very much
WenjieDu commented 1 year ago

Hi there,

Thank you so much for your attention to SAITS! If you find SAITS is helpful to your work, please star⭐️ this repository. Your star is your recognition, which can let others notice SAITS. It matters and is definitely a kind of contribution.

I have received your message and will respond ASAP. Thank you again for your patience! 😃

Best,
Wenjie

WenjieDu commented 1 year ago

Hi Zengsong,

Thank you for raising this issue, and for your patience. I'm sorry for my delayed response.

  1. Yes, there're two types of processing for imputation datasets, namely out-of-sample (splitting into the training set and the test set) and in-sample (non-splitting). Using which one depends on the application scenario. If your dataset is fixed and there aren't new samples coming in during your test, you can apply the in-sample. Otherwise, you should use the out-of-sample. Considering the out-of-sample is more objective and can help pick more robust models, we apply it in the experiments of paper SAITS.

  2. No, SAITS is trained with both MIT and ORT, i.e. with both loss functions. So both losses are propagated to update the model. Regarding whether SAITS will overfit, I would say all deep learning models may overfit the given dataset, and that depends on many factors, e.g. the dataset size and the training configurations. But SAITS is trained on multi tasks and the random masking mechanism in MIT can help ease the problem, so I prefer to say SAITS is not that easy to overfit.

  3. No, we apply an early-stopping strategy in the training of SAITS. The model doesn't have to wait to reach the specified epoch number. The training progress will be stopped if the training loss doesn't decrease for a certain number of epochs (like 20).

xuzengsong commented 1 year ago

Hi Wenjie, Thank you for your splendid answers. I think MIT is a really good approach ,and I'll try it in a later job,but I think the overfitting of data-imputation is caused by the working mechanism,we use what's visible to predict what's missing,we assume that missing information and observable information are same distribution,but there should be a difference,If the information mining ability of the model is strong enough or the data correlation is not as strong,it will definitely lead to overfitting,MIT may ease. It's just my opinion ,what do you think about it?

WenjieDu commented 1 year ago

Yes, that depends on the missing pattern of the data itself. If the missingness isn't random (e.g. caused by anomaly situations, and the missing data points are all outliers), then the imputations probably aren't accurate because the actual values aren't in the distribution that can be learned by the model; If the missingness is completely random (e.g. caused by the communication error), imputation models can usually work well.

WenjieDu commented 1 year ago

Close this issue for now due to all questions have been solved.