Closed abase123 closed 2 months ago
TSA was originally designed to capture segment-to-segment attention across different time series you mentioned, but in two stages: 1)capturing dependency among different timestamps of the same dimension; 2)capturing dependency among different dimensions at the same step. Experimental results show this two-stage method is better than using only one attention layer for both time and dimension.
As for the router mechanism, it is designed to reduce the complexity of capturing cross-dimension dependency, i.e. stage 2 above. Of course, you can replace it with a singleTransformer layer, but the computation overhead will be too large for high-dimensional datasets.
Hi, Sorry for a little beginner question. Would it be possible to modify the TSA such that we capture segment to segment attention across different time series ?