Question about masking rate and missing rate

tongnie commented 1 year ago

Hi, thank you for presenting such a nice paper and project! Since I'm not familiar with tsl code format, I have several questions about the missing rate and masking rate settings in your paper:

In METR-LA point missing case, you randomly drop 25% of the original data to construct an incomplete dataset (this is controlled by the 'p_noise' parameter). Then during traning stage, more missing data is generated for each input batch by setting 'whiten_prob' parameter. I'm confused about whether the 25% of missing data exists in the training data? If that is the case, the input contains missing data generated from both 'p_noise (25%)' and 'whiten_prob (e.g., 80%)' , which leads to a very sparse traning set. Is the loss only computed on the 'whiten_prob' parts during training?
If I want to train the model with training data only contains 'whiten_prob', and test it on different 'p_noise' levels, like the settings in Tab. 2 of your paper. But the difference is that, I do not add 'p_noise (25%)' to training data and would like to add different levels of 'p_noise' data to testing data, how could I achieve this goal based on your code?

Thanks in advance for your help!

marshka commented 1 year ago

Hi, thanks for your interest in our work! See inline.

I'm confused about whether the 25% of missing data exists in the training data?

I just wanted to let you know that you understood the masks correctly. While in principle you can consider as valid missing observations that 25% of injected missing data in the training set, we simulated the case that the underlying data-generating process is affected by this missing rate, not only at test time. Given the difficulty in evaluating imputation performance, this sounded like a nice solution to us, since starting from a dataset with few (really) missing data you can then test an imputation algorithm on a precise missing-data distribution.

If that is the case, the input contains missing data generated from both 'p_noise (25%)' and 'whiten_prob (e.g., 80%)' , which leads to a very sparse traning set. Is the loss only computed on the 'whiten_prob' parts during training?

Yes, it is a very sparse training set, but it is required for our algorithm as it is only trained on the points masked out following whiten_prob. Still, test performance is comparable to SOTA and makes SPIN more robust to missing-data distribution shifts.

If I want to train the model with training data only contains 'whiten_prob', and test it on different 'p_noise' levels, like the settings in Tab. 2 of your paper. But the difference is that, I do not add 'p_noise (25%)' to training data and would like to add different levels of 'p_noise' data to testing data, how could I achieve this goal based on your code?

You can do it by manually changing the masks, e.g., by setting an all-valid mask during training and then updating the masks as done here:

https://github.com/Graph-Machine-Learning-Group/spin/blob/2320695ff03b23606e73b05ac87f3ddff9d74c0c/experiments/run_inference.py#L138-L147

tongnie commented 1 year ago

Thanks for your reply ! Your suggestions are very helpful !

Graph-Machine-Learning-Group / spin

Question about masking rate and missing rate #6