Closed yanshuaibupt closed 1 year ago
I did not use graph convolution such as GCN and DCN because I think it's hard to define a perfect adjancent matrix for these methods in traffict prediction.
(In Encoder part) I used Transformer-liked model for the road sptial representation. You can think that I built a complete graph and then performed a graph transform on the complete graph. I think employing a learnable adjacent matrix and utilizing GCN or DCN could prove beneficial.
Do you mean to replace GCN with a transformer? Is your implementation based on the Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting paper?I recently read two papers about adaptive adjacency matrix, Ada-STNet and ASTTN, they are trying to build a learnable adjacent matrix, but according to the results on METR-LA and PEMS-BAY,there is not much difference between DCRNN, Graph WaveNet and so on, they are not as good as the traffic transformer.
你提到的《Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting》我在文章里也引用了[40],这篇文章的结果很好,但: 1、它主要在用Transformer做时间维度的建模,没有做空间;而这篇文章主要在做空间维度,在时间层面上我的观点是:12个时间片蕴含的信息量很少(就是个12*2的tensor),完全没有必要反复做时间维度,我的消融实验里把LSTM换成MLP也没有太大结果变化。 2、它编码了很多历史(一天前、一周前)的交通态势作为模型输入,而我认为这种trick和之前大部分文章的problem statement是矛盾的,大家一般默认是不能用历史数据当输入的。
另外关于learnable adjacent matrix,我认为在multi-head attention的框架里,learnable adjacent matrix就相当于attention matrix,而因为是多头注意力,并且叠了很多层,所以相当于有很多个层次化的邻接矩阵。之前看到过文章说self-attention是一种GNN,我觉得这个观点没有错(不过按这个逻辑,CNN、RNN其实也是GNN)。
除了图的structural learning,解决交通预测里的over-smoothing也很重要,而GCN在这方面问题比较大。ITS上有一篇叫clusterST的文章结果很炸裂,安利一下
可以分享下你的文章吗?我把你的复现当成是上面我提到的文章了。
First, thanks for your excellent job!But I can not find the GCN or DCN implementation in the code, did you use them?