Closed xuzengsong closed 1 year ago
Hi there,
Thank you so much for your attention to SAITS! If you find SAITS is helpful to your work, please star⭐️ this repository. Your star is your recognition, which can let others notice SAITS. It matters and is definitely a kind of contribution.
I have received your message and will respond ASAP. Thank you again for your patience! 😃
Best,
Wenjie
Hi Zengsong,
Thank you for raising this issue, and for your patience. I'm sorry for my delayed response.
Yes, there're two types of processing for imputation datasets, namely out-of-sample (splitting into the training set and the test set) and in-sample (non-splitting). Using which one depends on the application scenario. If your dataset is fixed and there aren't new samples coming in during your test, you can apply the in-sample. Otherwise, you should use the out-of-sample. Considering the out-of-sample is more objective and can help pick more robust models, we apply it in the experiments of paper SAITS.
No, SAITS is trained with both MIT and ORT, i.e. with both loss functions. So both losses are propagated to update the model. Regarding whether SAITS will overfit, I would say all deep learning models may overfit the given dataset, and that depends on many factors, e.g. the dataset size and the training configurations. But SAITS is trained on multi tasks and the random masking mechanism in MIT can help ease the problem, so I prefer to say SAITS is not that easy to overfit.
No, we apply an early-stopping strategy in the training of SAITS. The model doesn't have to wait to reach the specified epoch number. The training progress will be stopped if the training loss doesn't decrease for a certain number of epochs (like 20).
Hi Wenjie, Thank you for your splendid answers. I think MIT is a really good approach ,and I'll try it in a later job,but I think the overfitting of data-imputation is caused by the working mechanism,we use what's visible to predict what's missing,we assume that missing information and observable information are same distribution,but there should be a difference,If the information mining ability of the model is strong enough or the data correlation is not as strong,it will definitely lead to overfitting,MIT may ease. It's just my opinion ,what do you think about it?
Yes, that depends on the missing pattern of the data itself. If the missingness isn't random (e.g. caused by anomaly situations, and the missing data points are all outliers), then the imputations probably aren't accurate because the actual values aren't in the distribution that can be learned by the model; If the missingness is completely random (e.g. caused by the communication error), imputation models can usually work well.
Close this issue for now due to all questions have been solved.
Thank you for your work,I recently read your paper SAITS: Self-Attention-based Imputation for Time Series. I am also doing the work related to multivariate time series Imputation. I have some questions, and I hope to communicate with you. 1.I recently used your method to run the data set I used. My data processing approach is first divided into training set and test set, and then build time sequence, first use the train set to train, and then use the test set test (but I know data Imputation algorithm is unsupervised algorithm and did not use the true information of the missing data, there are some people who divided the test set training set, while there are also some people didn't,There are some differences in the results of your algorithm between these two data set partitioning methods,May I ask how do you view the partitioning of data sets?)