Changing routers - Githubissues

TSA was originally designed to capture segment-to-segment attention across different time series you mentioned, but in two stages: 1)capturing dependency among different timestamps of the same dimension; 2)capturing dependency among different dimensions at the same step. Experimental results show this two-stage method is better than using only one attention layer for both time and dimension.

As for the router mechanism, it is designed to reduce the complexity of capturing cross-dimension dependency, i.e. stage 2 above. Of course, you can replace it with a singleTransformer layer, but the computation overhead will be too large for high-dimensional datasets.

Thinklab-SJTU / Crossformer

Changing routers #26