Use MTSCI on discrete time series

JeremyChou28 / MTSCI

The official source code of MTSCI: A Conditional Diffusion Model for Consistent Imputation in Incomplete Time Series

4 stars 1 forks source link

Use MTSCI on discrete time series #1

Open rribault-fem opened 1 month ago

rribault-fem commented 1 month ago

Hello Jeremy,

Thanks for the available repository. Your work looks very interresting to me.

I would like to use the MTSCI on my own dataset and I have several samples of time series, but the recordings are not continuous in time. My application is to perform imputation of missing channels during time series already recorded.

So my training dataset is of shape (sample_nb, time_steps, channels). If I stack all the samples, I will have discontinuities between the samples, it will disturb the consistency checks right?

Maybe I can disable part of the consistency checks every xx time_steps?

JeremyChou28 commented 1 month ago

Hello Jeremy,

Thanks for the available repository. Your work looks very interresting to me.

I would like to use the MTSCI on my own dataset and I have several samples of time series, but the recordings are not continuous in time. My application is to perform imputation of missing channels during time series already recorded.

So my training dataset is of shape (sample_nb, time_steps, channels). If I stack all the samples, I will have discontinuities between the samples, it will disturb the consistency checks right?

Maybe I can disable part of the consistency checks every xx time_steps?

Thank you for your attention to our work. Your question is very interesting. If you stack discontinuous multiple time series samples, their inter-consistency may not necessarily work. But if these samples have periodicity, and then you interval the time step by the length of the period, there may be inter-consistency as well. This is a worthwhile experiment to try!

rribault-fem commented 1 month ago

Well, I am trying choose a sequence length in relation to the batch number to have batch with samples all in the same "recording" time. training is going on, will see how ot works

rribault-fem commented 1 month ago

Hello!

I can train the ETT block fine, however, when I want to use my own dataset with 17 features, I have an issue during training process. At line 64 of main.py, after the optimizer.step(), the weigth and biaise of model.diffmodel.input_projection become all nan.

Did you already got this kind of behavior?

PS : I am not 100% clear on the requirements for scaler (mean & std calculated on training data?) and for timestamp.pkl.

Well I'm not sure this issue is the most appropriate channel to discuss this :-)

JeremyChou28 commented 1 month ago

am not 100% clear on the requirements for scaler (mean & std calculated on training data?) and for timestamp.pkl.

I have never encountered a situation where it becomes' nan 'in this code. But based on my experience, you can first check if there is any step in your data processing where 'nan' occurs. For any division operation, you can consider adding '1e-5' to the denominator.

The process of calculating mean and std for training data involves normalizing each one-dimensional feature. Timestamp.pkl records the timestamp of each point in a time series.

rribault-fem commented 1 month ago

Well I did StandardScale in the data preparation step and now training is runnning slow, but running!