MengzhangLI / STFGNN

Code of STFGNN@AAAI-2021 (Spatial-Temporal/ Traffic data forecasting)
210 stars 56 forks source link

论文实验结果有严重的问题 #4

Closed LMissher closed 3 years ago

LMissher commented 3 years ago

从main.py第159行的y, x = test_y[:, : idx + 1, :], prediction[:, : idx + 1, :]可以看出使用的是预测时间序列平均值作为评价标准

之前的工作都是使用的预测序列的最后一个时间片的值作为评价标准

希望作者能公正比较,将论文中的结果改为最后一个时间片的值而不是平均值

MengzhangLI commented 3 years ago

Hi, thanks for you careful reading issue.

As you can see from paper, I also add STSGCN on METR-LA and PEMS-BAY to check STFGNN performance. But right now I can not conclude immediately whether it is right or wrong.

I will carefully re-check this problem, if performance on PEMS-BAY and METR-LA is very different, I would definately update ARXIV and AAAI camera-ready paper.

Best,

LMissher commented 3 years ago

单从代码来看的话,STSGCN也是使用的平均值,这样只能说明可能他们的工作在METR-LA和PEMS-BAY上的表现较差。但是您们的工作就平均值3.18来看取得的效果也并不理想。

另外数据预处理部分使用所有数据的均值和方差进行z-score归一化好像也不太对叭?应该只用训练集才是合理的?

MengzhangLI commented 3 years ago

单从代码来看的话,STSGCN也是使用的平均值,这样只能说明可能他们的工作在METR-LA和PEMS-BAY上的表现较差。但是您们的工作就平均值3.18来看取得的效果也并不理想。

另外数据预处理部分使用所有数据的均值和方差进行z-score归一化好像也不太对叭?应该只用训练集才是合理的?

1) Yes, you are right! Thank you very much for your re-checking! I'm implementing new experiments right now. As you can see from code, it is modified from STSGCN, whose dataset and metric is different from PEMS-BAY and METR-LA. If its result is bad, I would remove them, only containing PEMS03, PEMS04, PEMS07, PEMS08(results on these 4 datasets are definitely correct and meets up main results requirements just like STSGCN.)

2) Data preprocessing I also follow pipeline of STSGCN. You are correct, preprocessing should be done in train/valid/test seperately. But I think normalization is done correctly in code. You can see the figure:

aa

In "utils_4n0_3layer_12T_712_res.py" line 229 and 233, train/valid/test are divided first, then using z-score normalization. Also, in my novel code, if I remember correctly, you can see when generating temporal graph, I use "0.6" or "0.7" cutting training data.

Anyway, really thanks for your great issue.

Best,

LMissher commented 3 years ago

咦, 我看在main_4n0_3layer_12T_712_res_npz.py文件中的第41行代码: train_x, train_y, val_x, val_y, test_x, test_y = generate_data_train_val_test(dataset_dir_h5, data_dir) 使用的是下面的代码加载的数据呀: `def generate_data_train_val_test(dataset_dir_h5, data_dir, transformer=None): ''' shape is (num_of_samples, 12, num_of_vertices, 1) achieve data from train/val/test.npz separately. ''' df = pd.read_hdf(dataset_dir_h5) df_mean = df[df != 0].mean() # (207, ) df_std = df[df !=0].std() # (207, )

data_train = np.load(os.path.join(data_dir, 'train.npz'))
train_x, train_y = data_train['x'][:,:,:,0], data_train['y'][:,:,:,0] # metr-la: (23974, 12, 207), (23974, 12, 207)
data_val = np.load(os.path.join(data_dir, 'val.npz'))
val_x, val_y = data_val['x'][:,:,:,0], data_val['y'][:,:,:,0] # metr-la: (3425, 12, 207), (3425, 12, 207)
data_test = np.load(os.path.join(data_dir, 'test.npz'))
test_x, test_y = data_test['x'][:,:,:,0], data_test['y'][:,:,:,0]# metr-la: (6850, 12, 207), (6850, 12, 207)

for road_ix in range(df.shape[1]):
    road_mean = df_mean.tolist()[road_ix]
    road_std = df_std.tolist()[road_ix]

    # Padding zero value of Training set to mean value, Train_x normalization...
    train_x_block = train_x[:,:,road_ix] #(23974, 12)
    train_x_block_pad = pd.DataFrame(train_x_block)
    train_x_block_pad.replace(0, road_mean, inplace=True)
    train_x_block_pad = (np.array(train_x_block_pad) - road_mean) / road_std
    train_x[:,:,road_ix] = train_x_block_pad
    train_y_block = train_y[:,:,road_ix] #(23974, 12)
    train_y_block_pad = pd.DataFrame(train_y_block)
    train_y_block_pad.replace(0, road_mean, inplace=True)
    train_y[:,:,road_ix] = train_y_block_pad

    # Padding zero value of Validation set to mean value, valid_x normalization...
    val_x_block = val_x[:,:,road_ix] #(3425, 12)
    val_x_block_pad = pd.DataFrame(val_x_block)
    val_x_block_pad.replace(0, road_mean, inplace=True)
    val_x_block_pad = (np.array(val_x_block_pad) - road_mean) / road_std
    val_x[:,:,road_ix] = val_x_block_pad
    val_y_block = val_y[:,:,road_ix] #(3425, 12)
    val_y_block_pad = pd.DataFrame(val_y_block)
    val_y_block_pad.replace(0, road_mean, inplace=True)
    val_y[:,:,road_ix] = val_y_block_pad

    # Padding zero value of Test set(test_x) to mean value, test_x normalization...
    test_x_block = test_x[:,:,road_ix] #(6850, 12)
    test_x_block_pad = pd.DataFrame(test_x_block)
    test_x_block_pad.replace(0, road_mean, inplace=True)
    test_x_block_pad = (np.array(test_x_block_pad) - road_mean) / road_std
    test_x[:,:,road_ix] = test_x_block_pad
    # Test y keep "0" for fair comparison with previous baselines
    # test_y keep the same....
return train_x[:,:,:,np.newaxis], train_y, val_x[:,:,:,np.newaxis], val_y, test_x[:,:,:,np.newaxis], test_y`
MengzhangLI commented 3 years ago

咦, 我看在main_4n0_3layer_12T_712_res_npz.py文件中的第41行代码: train_x, train_y, val_x, val_y, test_x, test_y = generate_data_train_val_test(dataset_dir_h5, data_dir) 使用的是下面的代码加载的数据呀: `def generate_data_train_val_test(dataset_dir_h5, data_dir, transformer=None): ''' shape is (num_of_samples, 12, num_of_vertices, 1) achieve data from train/val/test.npz separately. ''' df = pd.read_hdf(dataset_dir_h5) df_mean = df[df != 0].mean() # (207, ) df_std = df[df !=0].std() # (207, )

data_train = np.load(os.path.join(data_dir, 'train.npz'))
train_x, train_y = data_train['x'][:,:,:,0], data_train['y'][:,:,:,0] # metr-la: (23974, 12, 207), (23974, 12, 207)
data_val = np.load(os.path.join(data_dir, 'val.npz'))
val_x, val_y = data_val['x'][:,:,:,0], data_val['y'][:,:,:,0] # metr-la: (3425, 12, 207), (3425, 12, 207)
data_test = np.load(os.path.join(data_dir, 'test.npz'))
test_x, test_y = data_test['x'][:,:,:,0], data_test['y'][:,:,:,0]# metr-la: (6850, 12, 207), (6850, 12, 207)

for road_ix in range(df.shape[1]):
    road_mean = df_mean.tolist()[road_ix]
    road_std = df_std.tolist()[road_ix]

    # Padding zero value of Training set to mean value, Train_x normalization...
    train_x_block = train_x[:,:,road_ix] #(23974, 12)
    train_x_block_pad = pd.DataFrame(train_x_block)
    train_x_block_pad.replace(0, road_mean, inplace=True)
    train_x_block_pad = (np.array(train_x_block_pad) - road_mean) / road_std
    train_x[:,:,road_ix] = train_x_block_pad
    train_y_block = train_y[:,:,road_ix] #(23974, 12)
    train_y_block_pad = pd.DataFrame(train_y_block)
    train_y_block_pad.replace(0, road_mean, inplace=True)
    train_y[:,:,road_ix] = train_y_block_pad

    # Padding zero value of Validation set to mean value, valid_x normalization...
    val_x_block = val_x[:,:,road_ix] #(3425, 12)
    val_x_block_pad = pd.DataFrame(val_x_block)
    val_x_block_pad.replace(0, road_mean, inplace=True)
    val_x_block_pad = (np.array(val_x_block_pad) - road_mean) / road_std
    val_x[:,:,road_ix] = val_x_block_pad
    val_y_block = val_y[:,:,road_ix] #(3425, 12)
    val_y_block_pad = pd.DataFrame(val_y_block)
    val_y_block_pad.replace(0, road_mean, inplace=True)
    val_y[:,:,road_ix] = val_y_block_pad

    # Padding zero value of Test set(test_x) to mean value, test_x normalization...
    test_x_block = test_x[:,:,road_ix] #(6850, 12)
    test_x_block_pad = pd.DataFrame(test_x_block)
    test_x_block_pad.replace(0, road_mean, inplace=True)
    test_x_block_pad = (np.array(test_x_block_pad) - road_mean) / road_std
    test_x[:,:,road_ix] = test_x_block_pad
    # Test y keep "0" for fair comparison with previous baselines
    # test_y keep the same....
return train_x[:,:,:,np.newaxis], train_y, val_x[:,:,:,np.newaxis], val_y, test_x[:,:,:,np.newaxis], test_y`

Well I misunderstand your remainder. I thought you mean universal pipeline rather than on PEMS-BAY/METR-LA. ;( It should be modified to: df = pd.read_hdf(dataset_dir_h5)[: TRAINING SET NUM] (just like that....) df_mean = df[df != 0].mean() # (207, ) df_std = df[df !=0].std() # (207, )

In next couple days I would get new results on PEMS-BAY and METR-LA. Please wait patiently. Really sorry for my stupid operation and thank you so much for careful remainder. Hoping to discuss with you in the future.

Best,

LMissher commented 3 years ago

是这个意思, 期待新的结果。

感谢您的回复~

MengzhangLI commented 3 years ago

是这个意思, 期待新的结果。

感谢您的回复~

非常感谢您的建议。我认为我文章中使用的8个数据集中的那4个处理的有问题。我个人感觉在PEMS-BAY和METR-LA上想达到SOTA很难了,在PeMSDM和PeMSDL上想达到SOTA需要认真调参了。

为了避免误人子弟,目前这几天我打算先更新arxiv和aaai的camera-ready。像AAAI2020的STSGCN一样,仅保留PEMS03, PEMS04, PEMS07, PEMS08的数值实验结果。您也知道,这4个数据集的结果肯定是正确无误的。同时,我会暂时移除剩下4个数据集的结果。但我未来一个月会间断性的调参数看看。(AAAI2021似乎是3月15号截止提交camera-ready)

这个issue会一直保留,期待和您就交通预测的问题持续交流。

LMissher commented 3 years ago

嗯嗯确实另外四个PEMS0x数据集的实验结果应该是没问题的,感谢您认真的回复与交流。 期待您新的实验结果和新的论文~

kevin-xuan commented 3 years ago

单从代码来看的话,STSGCN也是使用的平均值,这样只能说明可能他们的工作在METR-LA和PEMS-BAY上的表现较差。但是您们的工作就平均值3.18来看取得的效果也并不理想。

另外数据预处理部分使用所有数据的均值和方差进行z-score归一化好像也不太对叭?应该只用训练集才是合理的?

请问,从哪里看出来的STSGCN使用的是平均值呢? y, x = test_y[:, : idx + 1, :], prediction[:, : idx + 1, :]这行代码的意思是分别求前1个horizon,前2个horizon,...,前12个horizon(前12也就是指整个label了),最后代码是用前12个horizon(也就是整个label)的MAE值作为最终结果吧?我没看到有用平均值的地方?可以讨论一下吗?

MengzhangLI commented 3 years ago

单从代码来看的话,STSGCN也是使用的平均值,这样只能说明可能他们的工作在METR-LA和PEMS-BAY上的表现较差。但是您们的工作就平均值3.18来看取得的效果也并不理想。 另外数据预处理部分使用所有数据的均值和方差进行z-score归一化好像也不太对叭?应该只用训练集才是合理的?

请问,从哪里看出来的STSGCN使用的是平均值呢? y, x = test_y[:, : idx + 1, :], prediction[:, : idx + 1, :]这行代码的意思是分别求前1个horizon,前2个horizon,...,前12个horizon(前12也就是指整个label了),最后代码是用前12个horizon(也就是整个label)的MAE值作为最终结果吧?我没看到有用平均值的地方?可以讨论一下吗?

确实是使用前12个平均值,而不是第12个. 即上面的第二维是 12 而不是 1.