Closed liuquangao closed 7 months ago
As mentioned in the implementation details on page 6. We regard the input length as a hyperparameter and perform a grid search on 90, 180, 360 and 720. The grid search results are shown in the ablation study tables. We report the best result. The input length of 720 lead to the best result in most of the cases.
If there is no further question. I will close this issue. Please feel free to reopen it.
Thank you for your patience in answering this question. In Table 1, you seem to be quoting TimesNet's raw results, but this result is TimesNet's performance when the input window is 96. Does this involve an unfair comparison?
We do not think this is an unfair comparison since each model requires different input length. Equality does not mean fairness. They simply force all the model to have the same input length 96 is an unfair comparison itself! For an example, the DLinear requires a long look back window to extract the periodicity. However, setting the input as 96 will greatly limit its performance! This is like you chop some professional runner's leg to match your height and then force him to have a running competition with you. TimesNet also do many such things. Such as they compare the AD performance with anomaly transformer but did not use their special loss which is the key to this work! TimesNet's performance is nowhere close to anomaly transformer's raw performance. This group is ridiculous!
I meant no offence and thank you for your patience. It's just that a lot of recent articles on time series forecast have compared performance under the same input length. But now I agree with you that the equality does not mean fairness. Now I have no doubt, thanks again!
Yeah, I am just do not like their work. I also met a lot people here in ICLR and find out that everyone seems to be annoyed by what Tsinghua guys have been doing, LOL.
We do not think this is an unfair comparison since each model requires different input length. Equality does not mean fairness. They simply force all the model to have the same input length 96 is an unfair comparison itself! For an example, the DLinear requires a long look back window to extract the periodicity. However, setting the input as 96 will greatly limit its performance! This is like you chop some professional runner's leg to match your height and then force him to have a running competition with you. TimesNet also do many such things. Such as they compare the AD performance with anomaly transformer but did not use their special loss which is the key to this work! TimesNet's performance is nowhere close to anomaly transformer's raw performance. This group is ridiculous!
May I ask if it's the TimesNet paper where the anomaly transformer doesn't use their special loss?
See the second paragraph of the section 4.5 anomaly detection. They claim to use reconstruction error as the shared anomaly criterion for all experiments. We further check their code.
Hi, what is the length of the input sequence corresponding to the results reported in table 1? It does not seem to be mentioned in the paper .