Does the code implementation use graph convolution?

yanshuaibupt commented 1 year ago

First, thanks for your excellent job！But I can not find the GCN or DCN implementation in the code, did you use them?

R0oup1iao commented 1 year ago

I did not use graph convolution such as GCN and DCN because I think it's hard to define a perfect adjancent matrix for these methods in traffict prediction.

(In Encoder part) I used Transformer-liked model for the road sptial representation. You can think that I built a complete graph and then performed a graph transform on the complete graph. I think employing a learnable adjacent matrix and utilizing GCN or DCN could prove beneficial.

yanshuaibupt commented 1 year ago

Do you mean to replace GCN with a transformer? Is your implementation based on the Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting paper？I recently read two papers about adaptive adjacency matrix, Ada-STNet and ASTTN, they are trying to build a learnable adjacent matrix, but according to the results on METR-LA and PEMS-BAY，there is not much difference between DCRNN, Graph WaveNet and so on, they are not as good as the traffic transformer.

R0oup1iao commented 1 year ago

你提到的《Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting》我在文章里也引用了[40]，这篇文章的结果很好，但： 1、它主要在用Transformer做时间维度的建模，没有做空间；而这篇文章主要在做空间维度，在时间层面上我的观点是：12个时间片蕴含的信息量很少（就是个12*2的tensor），完全没有必要反复做时间维度，我的消融实验里把LSTM换成MLP也没有太大结果变化。 2、它编码了很多历史（一天前、一周前）的交通态势作为模型输入，而我认为这种trick和之前大部分文章的problem statement是矛盾的，大家一般默认是不能用历史数据当输入的。

另外关于learnable adjacent matrix，我认为在multi-head attention的框架里，learnable adjacent matrix就相当于attention matrix，而因为是多头注意力，并且叠了很多层，所以相当于有很多个层次化的邻接矩阵。之前看到过文章说self-attention是一种GNN，我觉得这个观点没有错（不过按这个逻辑，CNN、RNN其实也是GNN）。

除了图的structural learning，解决交通预测里的over-smoothing也很重要，而GCN在这方面问题比较大。ITS上有一篇叫clusterST的文章结果很炸裂，安利一下

yanshuaibupt commented 1 year ago

可以分享下你的文章吗？我把你的复现当成是上面我提到的文章了。

R0oup1iao commented 1 year ago

https://ieeexplore.ieee.org/document/9520129

R0oup1iao / Traffic-Transformer

Does the code implementation use graph convolution? #3