cure-lab / SCINet

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“. (NeurIPS 2022)
Apache License 2.0
617 stars 127 forks source link

Question about Dataset.__getitem__ #59

Closed kehuo closed 1 year ago

kehuo commented 1 year ago

Hi,

Good day! I have some concerns about the below getitem code for SCINet/data_process/etth_dataloader.py::Dataset_ETT_hour:

def __getitem__(self, index):
        s_begin = index
        s_end = s_begin + self.seq_len
        r_begin = s_end - self.label_len
        r_end = r_begin + self.label_len + self.pred_len

        seq_x = self.data_x[s_begin:s_end]  # 0 - 24
        seq_y = self.data_y[r_begin:r_end] # 0 - 48
        seq_x_mark = self.data_stamp[s_begin:s_end]
        seq_y_mark = self.data_stamp[r_begin:r_end]

[Question 1] - What does s stand for in s_begin, s_end? Similarly, what does r stand for in r_begin and r_end? [Question 2] - What is the difference between label_len and pred_len? In my opinion, these 2 seq len must be equal, why you use 2 seprate variables to define them? [Question 3] - Regarding the following 2 expressions, which one is correct? seq_len = x_train_seq_len + label_seq_len (seq_len not only contains X, but also contain label) seq_len = x_train_seq_len (seq_len only contain X, but not contain label) [Question 4] - According to the above code, given:

index = 0
seq_len = 8
label_len = 4
pred_len = 4
data_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
data_y = [1, 2, 3 ,4 ,5 ,6 ,7 ,8, 9, 10, 11, 12, 13, 14]

we can get below result:

s_begin = 0
s_end = 8

r_begin = 4
e_end = 12

seq_x = data_x[0 : 8]   = [1, 2, 3, 4, 5, 6, 7, 8]
seq_y = data_y[4 : 12] = [5, 6, 7, 8, 9, 10, 11, 12]

Does about seq_x and seq_y means that you want to use 0-8 data to forecasting the 4-12 data? It makes me confused because 4-8 data is known in seq_x, why do we need to forecasting? Only 9-12 is unknown for seq_x so we need to forecast.

However, in my opinion, I think below should be the correct seq_x and seq_y:

seq_x = data_x[0 : 8] = [1, 2, 3, 4, 5, 6, 7, 8]
seq_y = data_y[9: 12] = [9, 10, 11, 12]

because it means that I want to use 0-8 data to forecasting 9-12 data.

I am looking forward to your reply regarding this piece of code, thanks a lot.

kehuo commented 1 year ago

No issue, closed.

kwankoravich commented 1 year ago

Hi kehuo,

Could you please explain to me again for seq_x, seq_y?

I'm still confusing in this issue as well