Zeying-Gong / PatchMixer

About Code release for "PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting"
MIT License
169 stars 13 forks source link

Metrics (MSE and MAE) calcutaion. #2

Closed EmersonVilar closed 2 months ago

EmersonVilar commented 7 months ago

Hi, firstly, very good work in you paper. I read it some times, but I could't find the info if the metrics MSE, and MAE where calculated over all data train, or over the last input lookback window in the data (like it was a data test). I find in a paper that you based yours (A TIME SERIES IS WORTH 64 WORDS: LONG-TERM FORECASTING WITH TRANSFORMERS), the metrics is a forecast average of all input series (or features), but the forecast were over all data or only over the last one, or you just get the last MSE and MAE loss function value of training? Could you please clarify this?

Was I able to be clear?

Thank you,

Regards.

Zeying-Gong commented 6 months ago

Thank you very much for your interest in our work and for bringing up your questions. I appreciate the opportunity to clarify.

Regarding your inquiry about the calculation of MSE and MAE, during the training phase, the loss is computed as a direct sum of MSE and MAE in a 1:1 ratio, and this calculation is performed across the entire training dataset. Subsequently, for the testing phase, MSE and MAE are measured against the test set, which comprises data previously unseen by the model.

Note that our approach does not alter the way of dataloader from PatchTST (which is probably the earliest framework derived from Informer). We consider the new loss function employed during training as a trick to enhance training efficiency and effectiveness, rather than something that impacts the evaluation metrics during testing. We are in the process of revising our paper (currently under review and not yet updated on arXiv) and will ensure to include a more detailed explanation on this matter.

Please feel free to reach out if you have any more questions or need further clarification. Thank you once again for your engagement with our research.

EmersonVilar commented 6 months ago

Ok. After reading some articles, I found the data composition, if it will help you. The Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting use 12/4/4 hours notation for ETT and Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting use 6:2:2 percentage notation, but is the same thing.

Regards.

Zeying-Gong commented 2 months ago

Thank you for pointing out the detailed difference between these two configurations in the report, which did remind me. If you have any other questions, please reopen.