BUAABIGSCity / PDFormer

[AAAI2023] A PyTorch implementation of PDFormer: Propagation Delay-aware Dynamic Long-range Transformer for Traffic Flow Prediction.
MIT License
209 stars 35 forks source link

GNN+Seq2Seq vs Seq2Seq #28

Closed jexterliangsufe closed 11 months ago

jexterliangsufe commented 1 year ago

Hi, great works! I have questions about datasets mentioned in your paper and models which you use to compare with PDFormer.

I noticed the max number of nodes of datasets is 1024(T-Drive), which is not much greater than the number of variates in some newest Seq2Seq models(e.g. TimesNet, Autoformer, Informer, ...). In TimesNet paper, it compared TimesNet with other Seq2Seq models(But not GNN+Seq2Seq models) on a traffic dataset and achieved SOTA. Are models with GNN really better than models without GNN?

By the way, the number of nodes is often much greater than datasets in paper works. How can I use your model to solve such problems? Thanks anyway!

aptx1231 commented 1 year ago

(1) Are models with GNN really better than models without GNN?

Not necessarily, there is indeed a lot of work that has recently attempted to solve spatio-temporal forecasting problems by methods without graphical neural networks, e.g. [1][2]

[1] A Simple yet Effective Baseline for Multivariate Time Series Forecasting [2] SimST: A GNN-Free Spatio-Temporal Learning Framework for Traffic Forecasting

TimesNet articles of this type usually do long-range time series forecasting, and there is recent work that shows that this type of work is not as good as models that take spatio-temporal factors into account for forecasting [3] [3] HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

So this can only be said to be a matter of opinion, and we all have different perspectives

(2) The number of nodes is often much greater than datasets in paper works.

For research papers, 1024 nodes is almost the largest graph structure, unless it is some articles dealing specifically with traffic prediction for large graphs, where larger graphs are used for prediction via graph decomposition. For real life, 1024 nodes is indeed too few, and there is actually a generation gap between current research and industrial applications. Are you seeing which paper uses a particularly high number of data nodes? Can you share it with me?

(3) How can I use your model to solve such problems?

PDformer, GMAN and other models based on graph attention are really not suitable for processing large graphs (limited by efficiency), you can consider combining the operation of Graph Transformer computation on a large graph to update this model so that he can handle large graphs.

jexterliangsufe commented 1 year ago

Thanks for your kind reply! I learned a lot.

  1. I noticed the author of TimesNet created a github repo and one of the task is short-term forecasting. I know the transformer-based models are often used to solve long-term forecasting. I may ignore the different perspectives between models with gnn and modes without gnn because I only care about their performance on industrial applications.
  2. I didn't see any paper which do experiment on datasets with particular high number, so I created this issue.
  3. Do you mean that the model needs engineering optimization? By the way, there are plenty of industrial applications which require large graph processing. Why don't you consider reducing the gap between industrial applications and research?
aptx1231 commented 1 year ago

Models do need engineering optimization to video larger scale real world applications. Some of the work doing research on graph networks has attempted to make larger scale graph structures:

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification Graphsaint: Graph sampling based inductive learning method Cluster-gcn:An efficient algorithm for training deep and large graph convolutional networks GNN-autoscale: Scalableand expressive graph neural networks via historical embeddings

Combining these efforts with a model for time-series prediction should enable large-scale prediction. Large scale prediction is indeed a challenge and can be attempted as a follow up work~

One can also try to use the GNN-free model mentioned above for industrial applications.

jexterliangsufe commented 1 year ago

Thanks a lot! I am greatly inspired by your reply.

jexterliangsufe commented 1 year ago

By the way, I have a technical question. The data of spatio-tempral problem has one more dimension than the data of spatio problem(B T N D vs B N * D). Assuming that the latter can directly apply a gnn conv layer(GCNConv), the former needs to loop through T times. Meanwhile, [1]'s code implementation uses Einstein summation convention(torch.einsum('btnd,nm->btmd', x, A)) to replace the loop. [1] Graph WaveNet for Deep Spatial-Temporal Graph Modeling

Einstein summation does not seem to be possible with sparse matrix operations, which means this won't work with large graphs because of excessive sizes of A. Do you have any suggestion on it?