ant-research / Pyraformer

Apache License 2.0
252 stars 38 forks source link

some questions about preprocess_elect.py and data_loader.py #26

Open mw66 opened 1 year ago

mw66 commented 1 year ago

Hi,

I have a few questions about preprocess_elect.py:

1) in prep_data(): v_input[:, 1] is never used (read or write), so why you need this 2nd column? https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L35

2) about x_input: https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58 x_input[count, 1:, 0] from 1 onward, x_input contains the real raw input data, but x_input[count, 0, 0] is never assigned, so it will remain all 0s, which means it does not contain any real raw input data (https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L67 this line for x_input[count, 0, 0] is also zero) why don't you just drop all such x_input[:, 0, :], since they are the wrong training data? and why you want to save it in the final train npy file? i.e. change https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L72-L74 to

    np.save(prefix+'data_'+save_name, x_input[:, 1:, :])
    np.save(prefix+'v_'+save_name, v_input[1:, :])
    np.save(prefix+'label_'+save_name, label[1:, :])

and I did some inspection of the saved train data, it's confirmed that they are all 0s:

>>> import numpy as np
>>> t = np.load("data/elect/train_data_elect.npy")
>>> np.max(t[:, 0, 0])
0.0
>>> np.min(t[:, 0, 0])
0.0
>>>
mw66 commented 1 year ago

@Zhazhan

my 3rd question:

3)

https://github.com/ant-research/Pyraformer/blob/master/preprocess_elect.py#L58

            x_input[count, 1:, 0] = data[window_start:window_end-1, series]

so, the x_input[:, :, 0] is the raw input sequence data,

but in: https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L440-L445

        cov = all_data[:, :, 2:]   # the raw input sequence data is dropped here?

        split_start = len(label[0]) - self.pred_length + 1
        data, label = split(split_start, label, cov, self.pred_length)

        return data, label

it's dropped from the training data?

This is the same question I have here: https://github.com/ant-research/Pyraformer/issues/25#issuecomment-1509923168

So the previous value of the raw input sequence value is not used at all in training?

mw66 commented 1 year ago

ok, for my question 3), I found:

https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L443

        data, label = split(split_start, label, cov, self.pred_length)

which on https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L398-L403

            single_data = batch_label[i:(split_start+i)].clone().unsqueeze(1)
            single_data[-1] = -1
            single_cov = cov[batch_idx, i:(split_start+i), :].clone()
            temp_data = [single_data, single_cov]
            single_data = torch.cat(temp_data, dim=1)
            all_data.append(single_data)

insert the label (as previous values in the window) back into the all_data. This is confusing, why you choose to do it this way?

Also, the implementation of electTrainDataset.__getitem__ https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L432

is so different from electTestDataset.__getitem__ https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L460

in particular https://github.com/ant-research/Pyraformer/blob/master/data_loader.py#L473-L477

            single_data = data[i:(split_start+i)].clone().unsqueeze(1)
            single_data[-1] = -1
            single_cov = cov[i:(split_start+i), :].clone()
            single_data = torch.cat([single_data, single_cov], dim=1)
            all_data.append(single_data)

Here, you didn't do the same to insert the label (as previous values in the window) back into all_data, why there is such difference?