HyunWookL / TESTAM

Official Code of TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
MIT License
26 stars 1 forks source link

Some questions in reproducting results #5

Open Jimmy-7664 opened 1 month ago

Jimmy-7664 commented 1 month ago

I ran the code according to the guide in the README without modifying the code, but the results I get are a bit different from the paper, is there any possible reason for this?

My log here:

Namespace(device='cuda:3', data='./data/METR-LA', adjdata='./data/METR-LA/adj_mx.pkl', adjtype='doubletransition', seq_length=12, nhid=32, in_dim=2, num_nodes=207, batch_size=16, dropout=0.0, epochs=100, print_every=200, seed=-1, save='./experiment/METR-LA_0/TESTAM', expid=1, load_path=None, patience=15, lr_mul=1, n_warmup_steps=4000, quantile=0.7, is_quantile=False, warmup_epoch=0)
Train the model with 205435 parameters
start training...
Iter: 000, Train Loss: 22.6457, Train MAPE: 0.2849, Train RMSE: 13.6601
Iter: 200, Train Loss: 4.9007, Train MAPE: 0.1562, Train RMSE: 9.3743
Iter: 400, Train Loss: 5.4357, Train MAPE: 0.1293, Train RMSE: 9.6406
Iter: 600, Train Loss: 5.3063, Train MAPE: 0.1967, Train RMSE: 10.0381
Iter: 800, Train Loss: 3.6567, Train MAPE: 0.0784, Train RMSE: 6.8035
Iter: 1000, Train Loss: 4.4894, Train MAPE: 0.1656, Train RMSE: 9.0105
Iter: 1200, Train Loss: 3.8237, Train MAPE: 0.1031, Train RMSE: 7.2108
Iter: 1400, Train Loss: 3.9696, Train MAPE: 0.1034, Train RMSE: 7.5186
Epoch: 001, Inference Time: 7.7461 secs
Epoch: 001, Train Loss: 4.9367, Train MAPE: 0.1347, Train RMSE: 8.4507, Valid Loss: 3.7926, Valid MAPE: 0.1262, Valid RMSE: 6.7112, Training Time: 188.6600/epoch
Iter: 000, Train Loss: 3.6769, Train MAPE: 0.1167, Train RMSE: 7.1560
Iter: 200, Train Loss: 4.0358, Train MAPE: 0.0984, Train RMSE: 8.1459
Iter: 400, Train Loss: 4.4978, Train MAPE: 0.1242, Train RMSE: 7.9133
Iter: 600, Train Loss: 3.9247, Train MAPE: 0.1048, Train RMSE: 7.0893
Iter: 800, Train Loss: 3.9454, Train MAPE: 0.1169, Train RMSE: 8.1039
Iter: 1000, Train Loss: 3.5258, Train MAPE: 0.0920, Train RMSE: 6.7911
Iter: 1200, Train Loss: 3.9588, Train MAPE: 0.1001, Train RMSE: 7.2089
Iter: 1400, Train Loss: 3.2076, Train MAPE: 0.0658, Train RMSE: 5.3511
Epoch: 002, Inference Time: 7.7166 secs
Epoch: 002, Train Loss: 3.8580, Train MAPE: 0.1069, Train RMSE: 7.4156, Valid Loss: 3.4249, Valid MAPE: 0.1084, Valid RMSE: 6.4224, Training Time: 188.4465/epoch
Iter: 000, Train Loss: 3.3081, Train MAPE: 0.1014, Train RMSE: 6.8087
Iter: 200, Train Loss: 3.7726, Train MAPE: 0.1049, Train RMSE: 7.1955
Iter: 400, Train Loss: 3.9604, Train MAPE: 0.1257, Train RMSE: 8.1108
Iter: 600, Train Loss: 3.6580, Train MAPE: 0.1117, Train RMSE: 7.3913
Iter: 800, Train Loss: 3.4112, Train MAPE: 0.0985, Train RMSE: 6.8128
Iter: 1000, Train Loss: 4.2818, Train MAPE: 0.1224, Train RMSE: 8.4289
Iter: 1200, Train Loss: 4.0717, Train MAPE: 0.1219, Train RMSE: 7.8983
Iter: 1400, Train Loss: 4.0698, Train MAPE: 0.1203, Train RMSE: 8.7316
Epoch: 003, Inference Time: 7.8467 secs
Epoch: 003, Train Loss: 3.6759, Train MAPE: 0.1015, Train RMSE: 7.1463, Valid Loss: 3.3489, Valid MAPE: 0.1040, Valid RMSE: 6.2773, Training Time: 189.0347/epoch
Iter: 000, Train Loss: 3.6736, Train MAPE: 0.1082, Train RMSE: 7.5115
Iter: 200, Train Loss: 3.9350, Train MAPE: 0.1106, Train RMSE: 7.2938
Iter: 400, Train Loss: 3.5539, Train MAPE: 0.0945, Train RMSE: 7.0347
Iter: 600, Train Loss: 3.1711, Train MAPE: 0.0695, Train RMSE: 5.7519
Iter: 800, Train Loss: 3.5838, Train MAPE: 0.1020, Train RMSE: 7.2952
Iter: 1000, Train Loss: 3.8278, Train MAPE: 0.1059, Train RMSE: 7.5241
Iter: 1200, Train Loss: 3.3314, Train MAPE: 0.0850, Train RMSE: 6.3735
Iter: 1400, Train Loss: 3.2684, Train MAPE: 0.0871, Train RMSE: 6.6193
Epoch: 004, Inference Time: 7.6951 secs
Epoch: 004, Train Loss: 3.5013, Train MAPE: 0.0957, Train RMSE: 6.8820, Valid Loss: 3.1251, Valid MAPE: 0.0929, Valid RMSE: 6.0607, Training Time: 188.4957/epoch
Iter: 000, Train Loss: 3.1663, Train MAPE: 0.0721, Train RMSE: 5.7290
Iter: 200, Train Loss: 3.9585, Train MAPE: 0.1268, Train RMSE: 7.8801
Iter: 400, Train Loss: 3.1914, Train MAPE: 0.0884, Train RMSE: 6.4603
Iter: 600, Train Loss: 3.3242, Train MAPE: 0.0909, Train RMSE: 6.5193
Iter: 800, Train Loss: 3.2287, Train MAPE: 0.0784, Train RMSE: 6.3731
Iter: 1000, Train Loss: 3.1944, Train MAPE: 0.0994, Train RMSE: 6.5677
Iter: 1200, Train Loss: 3.2542, Train MAPE: 0.0842, Train RMSE: 6.8340
Iter: 1400, Train Loss: 3.3964, Train MAPE: 0.0990, Train RMSE: 6.9791
Epoch: 005, Inference Time: 7.6855 secs
Epoch: 005, Train Loss: 3.3076, Train MAPE: 0.0883, Train RMSE: 6.5221, Valid Loss: 3.0397, Valid MAPE: 0.0892, Valid RMSE: 5.9037, Training Time: 188.0712/epoch
Iter: 000, Train Loss: 3.5912, Train MAPE: 0.1152, Train RMSE: 7.3814
Iter: 200, Train Loss: 3.2632, Train MAPE: 0.0901, Train RMSE: 6.5791
Iter: 400, Train Loss: 3.3987, Train MAPE: 0.0843, Train RMSE: 6.4561
Iter: 600, Train Loss: 3.7325, Train MAPE: 0.1000, Train RMSE: 7.6094
Iter: 800, Train Loss: 2.9407, Train MAPE: 0.0660, Train RMSE: 5.6210
Iter: 1000, Train Loss: 3.3396, Train MAPE: 0.0885, Train RMSE: 6.9440
Iter: 1200, Train Loss: 3.1856, Train MAPE: 0.0828, Train RMSE: 6.2714
Iter: 1400, Train Loss: 3.5595, Train MAPE: 0.1015, Train RMSE: 7.7379
Epoch: 006, Inference Time: 7.7836 secs
Epoch: 006, Train Loss: 3.3618, Train MAPE: 0.0899, Train RMSE: 6.5980, Valid Loss: 3.1148, Valid MAPE: 0.0942, Valid RMSE: 5.9999, Training Time: 188.6084/epoch
Iter: 000, Train Loss: 3.9300, Train MAPE: 0.0988, Train RMSE: 7.2724
Iter: 200, Train Loss: 2.9159, Train MAPE: 0.0642, Train RMSE: 5.4318
Iter: 400, Train Loss: 3.1898, Train MAPE: 0.0718, Train RMSE: 5.9808
Iter: 600, Train Loss: 3.5681, Train MAPE: 0.0992, Train RMSE: 7.2033
Iter: 800, Train Loss: 3.4662, Train MAPE: 0.0965, Train RMSE: 6.8453
Iter: 1000, Train Loss: 3.0918, Train MAPE: 0.0741, Train RMSE: 5.7710
Iter: 1200, Train Loss: 2.8359, Train MAPE: 0.0663, Train RMSE: 5.7241
Iter: 1400, Train Loss: 3.3626, Train MAPE: 0.0929, Train RMSE: 6.8524
Epoch: 007, Inference Time: 7.7730 secs
Epoch: 007, Train Loss: 3.3225, Train MAPE: 0.0885, Train RMSE: 6.5149, Valid Loss: 3.0464, Valid MAPE: 0.0882, Valid RMSE: 5.8639, Training Time: 188.4266/epoch
Iter: 000, Train Loss: 2.9360, Train MAPE: 0.0700, Train RMSE: 5.7383
Iter: 200, Train Loss: 3.0079, Train MAPE: 0.0668, Train RMSE: 5.7449
Iter: 400, Train Loss: 3.4390, Train MAPE: 0.0919, Train RMSE: 6.4714
Iter: 600, Train Loss: 2.6481, Train MAPE: 0.0533, Train RMSE: 4.9869
Iter: 800, Train Loss: 2.8901, Train MAPE: 0.0628, Train RMSE: 5.3428
Iter: 1000, Train Loss: 3.2594, Train MAPE: 0.0818, Train RMSE: 6.3950
Iter: 1200, Train Loss: 3.4223, Train MAPE: 0.0872, Train RMSE: 6.7961
Iter: 1400, Train Loss: 3.6236, Train MAPE: 0.1043, Train RMSE: 7.1276
Epoch: 008, Inference Time: 7.8673 secs
Epoch: 008, Train Loss: 3.2096, Train MAPE: 0.0845, Train RMSE: 6.3231, Valid Loss: 2.9970, Valid MAPE: 0.0874, Valid RMSE: 5.8014, Training Time: 188.5604/epoch
Iter: 000, Train Loss: 3.8474, Train MAPE: 0.1311, Train RMSE: 7.5577
Iter: 200, Train Loss: 3.6420, Train MAPE: 0.1080, Train RMSE: 6.9074
Iter: 400, Train Loss: 3.4917, Train MAPE: 0.0911, Train RMSE: 6.5479
Iter: 600, Train Loss: 3.6501, Train MAPE: 0.1030, Train RMSE: 7.2486
Iter: 800, Train Loss: 3.4578, Train MAPE: 0.0886, Train RMSE: 6.4335
Iter: 1000, Train Loss: 3.0082, Train MAPE: 0.0808, Train RMSE: 6.2013
Iter: 1200, Train Loss: 3.7256, Train MAPE: 0.1024, Train RMSE: 7.6003
Iter: 1400, Train Loss: 3.1067, Train MAPE: 0.0822, Train RMSE: 6.2087
Epoch: 009, Inference Time: 7.6691 secs
Epoch: 009, Train Loss: 3.3494, Train MAPE: 0.0890, Train RMSE: 6.5403, Valid Loss: 3.0625, Valid MAPE: 0.0892, Valid RMSE: 5.8405, Training Time: 189.1770/epoch
Iter: 000, Train Loss: 3.0373, Train MAPE: 0.0632, Train RMSE: 5.0491
Iter: 200, Train Loss: 3.4179, Train MAPE: 0.0840, Train RMSE: 6.5127
Iter: 400, Train Loss: 3.2062, Train MAPE: 0.0833, Train RMSE: 6.3132
Iter: 600, Train Loss: 3.2724, Train MAPE: 0.0802, Train RMSE: 5.8654
Iter: 800, Train Loss: 3.0673, Train MAPE: 0.0750, Train RMSE: 5.4742
Iter: 1000, Train Loss: 2.9402, Train MAPE: 0.0671, Train RMSE: 5.7712
Iter: 1200, Train Loss: 3.1837, Train MAPE: 0.0772, Train RMSE: 5.9624
Iter: 1400, Train Loss: 3.0595, Train MAPE: 0.0798, Train RMSE: 6.0340
Epoch: 010, Inference Time: 7.8226 secs
Epoch: 010, Train Loss: 3.2438, Train MAPE: 0.0856, Train RMSE: 6.3792, Valid Loss: 3.0022, Valid MAPE: 0.0878, Valid RMSE: 5.8154, Training Time: 188.6007/epoch
Iter: 000, Train Loss: 2.5156, Train MAPE: 0.0632, Train RMSE: 4.8655
Iter: 200, Train Loss: 2.8181, Train MAPE: 0.0680, Train RMSE: 5.7017
Iter: 400, Train Loss: 3.1919, Train MAPE: 0.0877, Train RMSE: 6.2610
Iter: 600, Train Loss: 3.5958, Train MAPE: 0.0885, Train RMSE: 6.6543
Iter: 800, Train Loss: 3.4379, Train MAPE: 0.0926, Train RMSE: 7.1159
Iter: 1000, Train Loss: 2.9896, Train MAPE: 0.0723, Train RMSE: 6.1875
Iter: 1200, Train Loss: 3.5095, Train MAPE: 0.0997, Train RMSE: 7.0381
Iter: 1400, Train Loss: 4.1240, Train MAPE: 0.1488, Train RMSE: 8.3987
Epoch: 011, Inference Time: 7.7810 secs
Epoch: 011, Train Loss: 3.2195, Train MAPE: 0.0849, Train RMSE: 6.3363, Valid Loss: 3.1086, Valid MAPE: 0.0899, Valid RMSE: 5.9200, Training Time: 188.7209/epoch
Iter: 000, Train Loss: 3.5777, Train MAPE: 0.1142, Train RMSE: 7.6756
Iter: 200, Train Loss: 3.2519, Train MAPE: 0.0790, Train RMSE: 6.0867
Iter: 400, Train Loss: 3.7765, Train MAPE: 0.0969, Train RMSE: 6.6305
Iter: 600, Train Loss: 3.1919, Train MAPE: 0.0825, Train RMSE: 6.4965
Iter: 800, Train Loss: 2.9242, Train MAPE: 0.0721, Train RMSE: 5.8655
Iter: 1000, Train Loss: 3.5502, Train MAPE: 0.1035, Train RMSE: 7.2264
Iter: 1200, Train Loss: 3.6280, Train MAPE: 0.0998, Train RMSE: 7.1050
Iter: 1400, Train Loss: 3.5069, Train MAPE: 0.1050, Train RMSE: 6.6287
Epoch: 012, Inference Time: 7.8121 secs
Epoch: 012, Train Loss: 3.2833, Train MAPE: 0.0871, Train RMSE: 6.4504, Valid Loss: 3.0276, Valid MAPE: 0.0875, Valid RMSE: 5.8835, Training Time: 189.7191/epoch
Iter: 000, Train Loss: 3.1728, Train MAPE: 0.0796, Train RMSE: 6.6935
Iter: 200, Train Loss: 3.3807, Train MAPE: 0.0951, Train RMSE: 6.7178
Iter: 400, Train Loss: 2.8038, Train MAPE: 0.0627, Train RMSE: 5.3850
Iter: 600, Train Loss: 2.8452, Train MAPE: 0.0649, Train RMSE: 5.5230
Iter: 800, Train Loss: 3.6740, Train MAPE: 0.1098, Train RMSE: 7.1714
Iter: 1000, Train Loss: 3.3395, Train MAPE: 0.0985, Train RMSE: 6.4447
Iter: 1200, Train Loss: 3.0324, Train MAPE: 0.1036, Train RMSE: 6.3679
Iter: 1400, Train Loss: 3.1540, Train MAPE: 0.0856, Train RMSE: 6.2739
Epoch: 013, Inference Time: 8.6503 secs
Epoch: 013, Train Loss: 3.1864, Train MAPE: 0.0838, Train RMSE: 6.2821, Valid Loss: 2.9944, Valid MAPE: 0.0881, Valid RMSE: 5.7916, Training Time: 188.8436/epoch
Iter: 000, Train Loss: 3.2143, Train MAPE: 0.0871, Train RMSE: 6.2177
Iter: 200, Train Loss: 3.2134, Train MAPE: 0.0879, Train RMSE: 6.2499
Iter: 400, Train Loss: 2.9545, Train MAPE: 0.0726, Train RMSE: 5.9541
Iter: 600, Train Loss: 3.2457, Train MAPE: 0.0944, Train RMSE: 6.4343
Iter: 800, Train Loss: 3.4222, Train MAPE: 0.0925, Train RMSE: 7.0201
Iter: 1000, Train Loss: 3.3691, Train MAPE: 0.0846, Train RMSE: 6.4727
Iter: 1200, Train Loss: 3.1997, Train MAPE: 0.0899, Train RMSE: 6.5761
Iter: 1400, Train Loss: 3.3841, Train MAPE: 0.0953, Train RMSE: 6.1978
Epoch: 014, Inference Time: 7.9575 secs
Epoch: 014, Train Loss: 3.2392, Train MAPE: 0.0857, Train RMSE: 6.3873, Valid Loss: 3.0784, Valid MAPE: 0.0886, Valid RMSE: 5.9231, Training Time: 195.0913/epoch
Iter: 000, Train Loss: 2.7551, Train MAPE: 0.0621, Train RMSE: 5.4969
Iter: 200, Train Loss: 3.7435, Train MAPE: 0.1040, Train RMSE: 7.1062
Iter: 400, Train Loss: 3.7944, Train MAPE: 0.1209, Train RMSE: 7.9308
Iter: 600, Train Loss: 2.7575, Train MAPE: 0.0656, Train RMSE: 5.2393
Iter: 800, Train Loss: 2.9517, Train MAPE: 0.0701, Train RMSE: 6.0118
Iter: 1000, Train Loss: 2.9397, Train MAPE: 0.0697, Train RMSE: 5.7752
Iter: 1200, Train Loss: 3.6133, Train MAPE: 0.0955, Train RMSE: 6.9690
Iter: 1400, Train Loss: 3.3374, Train MAPE: 0.1037, Train RMSE: 6.8182
Epoch: 015, Inference Time: 7.8461 secs
Epoch: 015, Train Loss: 3.2321, Train MAPE: 0.0854, Train RMSE: 6.3654, Valid Loss: 3.0008, Valid MAPE: 0.0859, Valid RMSE: 5.8560, Training Time: 196.3256/epoch
Iter: 000, Train Loss: 3.5433, Train MAPE: 0.0957, Train RMSE: 7.1121
Iter: 200, Train Loss: 3.4086, Train MAPE: 0.0941, Train RMSE: 7.0619
Iter: 400, Train Loss: 2.5656, Train MAPE: 0.0481, Train RMSE: 4.7360
Iter: 600, Train Loss: 2.6704, Train MAPE: 0.0621, Train RMSE: 5.3353
Iter: 800, Train Loss: 2.5595, Train MAPE: 0.0576, Train RMSE: 5.2961
Iter: 1000, Train Loss: 2.8266, Train MAPE: 0.0692, Train RMSE: 5.8474
Iter: 1200, Train Loss: 2.9067, Train MAPE: 0.0699, Train RMSE: 5.8089
Iter: 1400, Train Loss: 3.4815, Train MAPE: 0.0919, Train RMSE: 6.7787
Epoch: 016, Inference Time: 7.8577 secs
Epoch: 016, Train Loss: 3.1440, Train MAPE: 0.0826, Train RMSE: 6.2229, Valid Loss: 2.9803, Valid MAPE: 0.0867, Valid RMSE: 5.7845, Training Time: 195.1929/epoch
Iter: 000, Train Loss: 3.1654, Train MAPE: 0.0804, Train RMSE: 6.2667
Iter: 200, Train Loss: 2.4119, Train MAPE: 0.0459, Train RMSE: 4.2445
Iter: 400, Train Loss: 3.4772, Train MAPE: 0.0850, Train RMSE: 6.3621
Iter: 600, Train Loss: 3.7466, Train MAPE: 0.1130, Train RMSE: 7.8776
Iter: 800, Train Loss: 3.5222, Train MAPE: 0.0820, Train RMSE: 6.5950
Iter: 1000, Train Loss: 3.3785, Train MAPE: 0.0868, Train RMSE: 6.6804
Iter: 1200, Train Loss: 2.9301, Train MAPE: 0.0660, Train RMSE: 5.8293
Iter: 1400, Train Loss: 3.6917, Train MAPE: 0.0921, Train RMSE: 7.5803
Epoch: 017, Inference Time: 8.1438 secs
Epoch: 017, Train Loss: 3.2624, Train MAPE: 0.0864, Train RMSE: 6.4135, Valid Loss: 3.0838, Valid MAPE: 0.0906, Valid RMSE: 5.7951, Training Time: 198.1843/epoch
Iter: 000, Train Loss: 3.8805, Train MAPE: 0.1291, Train RMSE: 7.4755
Iter: 200, Train Loss: 3.5731, Train MAPE: 0.1020, Train RMSE: 7.0921
Iter: 400, Train Loss: 3.3488, Train MAPE: 0.0760, Train RMSE: 6.5832
Iter: 600, Train Loss: 3.0758, Train MAPE: 0.0771, Train RMSE: 6.1833
Iter: 800, Train Loss: 3.1191, Train MAPE: 0.0779, Train RMSE: 6.0038
Iter: 1000, Train Loss: 2.7884, Train MAPE: 0.0609, Train RMSE: 5.1996
Iter: 1200, Train Loss: 3.2597, Train MAPE: 0.0966, Train RMSE: 6.7431
Iter: 1400, Train Loss: 3.2776, Train MAPE: 0.0858, Train RMSE: 6.3398
Epoch: 018, Inference Time: 7.8717 secs
Epoch: 018, Train Loss: 3.1838, Train MAPE: 0.0839, Train RMSE: 6.2863, Valid Loss: 2.9878, Valid MAPE: 0.0862, Valid RMSE: 5.7991, Training Time: 200.2274/epoch
Iter: 000, Train Loss: 3.7951, Train MAPE: 0.1061, Train RMSE: 7.3139
Iter: 200, Train Loss: 3.4128, Train MAPE: 0.0928, Train RMSE: 6.6583
Iter: 400, Train Loss: 2.8167, Train MAPE: 0.0629, Train RMSE: 5.4966
Iter: 600, Train Loss: 2.9434, Train MAPE: 0.0784, Train RMSE: 6.0057
Iter: 800, Train Loss: 3.4480, Train MAPE: 0.0968, Train RMSE: 6.7211
Iter: 1000, Train Loss: 3.3894, Train MAPE: 0.1070, Train RMSE: 7.1152
Iter: 1200, Train Loss: 3.1680, Train MAPE: 0.0742, Train RMSE: 6.3472
Iter: 1400, Train Loss: 3.5476, Train MAPE: 0.1027, Train RMSE: 6.7256
Epoch: 019, Inference Time: 7.8057 secs
Epoch: 019, Train Loss: 3.1649, Train MAPE: 0.0834, Train RMSE: 6.2561, Valid Loss: 3.0336, Valid MAPE: 0.0881, Valid RMSE: 5.8731, Training Time: 190.3236/epoch
Iter: 000, Train Loss: 3.3307, Train MAPE: 0.0947, Train RMSE: 7.0603
Iter: 200, Train Loss: 3.5030, Train MAPE: 0.0953, Train RMSE: 6.8996
Iter: 400, Train Loss: 3.2890, Train MAPE: 0.0801, Train RMSE: 6.3208
Iter: 600, Train Loss: 3.6238, Train MAPE: 0.1117, Train RMSE: 7.3662
Iter: 800, Train Loss: 3.0268, Train MAPE: 0.0822, Train RMSE: 5.9705
Iter: 1000, Train Loss: 3.3935, Train MAPE: 0.0866, Train RMSE: 6.7349
Iter: 1200, Train Loss: 3.7591, Train MAPE: 0.1168, Train RMSE: 7.6888
Iter: 1400, Train Loss: 3.1537, Train MAPE: 0.0790, Train RMSE: 6.1972
Epoch: 020, Inference Time: 7.8862 secs
Epoch: 020, Train Loss: 3.2302, Train MAPE: 0.0855, Train RMSE: 6.3676, Valid Loss: 3.0185, Valid MAPE: 0.0882, Valid RMSE: 5.7962, Training Time: 190.2270/epoch
Iter: 000, Train Loss: 2.8130, Train MAPE: 0.0648, Train RMSE: 5.5945
Iter: 200, Train Loss: 2.6941, Train MAPE: 0.0627, Train RMSE: 5.3814
Iter: 400, Train Loss: 3.4524, Train MAPE: 0.0967, Train RMSE: 7.0866
Iter: 600, Train Loss: 3.2576, Train MAPE: 0.0915, Train RMSE: 6.6124
Iter: 800, Train Loss: 2.9213, Train MAPE: 0.0680, Train RMSE: 5.7122
Iter: 1000, Train Loss: 2.7320, Train MAPE: 0.0604, Train RMSE: 5.3519
Iter: 1200, Train Loss: 2.9420, Train MAPE: 0.0764, Train RMSE: 5.7265
Iter: 1400, Train Loss: 3.1640, Train MAPE: 0.0803, Train RMSE: 6.1639
Epoch: 021, Inference Time: 7.7302 secs
Epoch: 021, Train Loss: 3.1414, Train MAPE: 0.0827, Train RMSE: 6.2263, Valid Loss: 2.9790, Valid MAPE: 0.0857, Valid RMSE: 5.7926, Training Time: 191.1926/epoch
Iter: 000, Train Loss: 3.1090, Train MAPE: 0.0819, Train RMSE: 6.0616
Iter: 200, Train Loss: 2.8334, Train MAPE: 0.0654, Train RMSE: 5.9359
Iter: 400, Train Loss: 2.8278, Train MAPE: 0.0617, Train RMSE: 5.3205
Iter: 600, Train Loss: 3.0181, Train MAPE: 0.0721, Train RMSE: 5.7947
Iter: 800, Train Loss: 3.2375, Train MAPE: 0.0923, Train RMSE: 6.5367
Iter: 1000, Train Loss: 3.4388, Train MAPE: 0.0901, Train RMSE: 6.5049
Iter: 1200, Train Loss: 3.1729, Train MAPE: 0.0895, Train RMSE: 6.3220
Iter: 1400, Train Loss: 2.8137, Train MAPE: 0.0589, Train RMSE: 5.4024
Epoch: 022, Inference Time: 7.8742 secs
Epoch: 022, Train Loss: 3.1918, Train MAPE: 0.0843, Train RMSE: 6.3058, Valid Loss: 3.0530, Valid MAPE: 0.0879, Valid RMSE: 5.9068, Training Time: 191.5503/epoch
Iter: 000, Train Loss: 3.2430, Train MAPE: 0.0875, Train RMSE: 6.7084
Iter: 200, Train Loss: 3.0113, Train MAPE: 0.0724, Train RMSE: 5.8572
Iter: 400, Train Loss: 3.1530, Train MAPE: 0.0704, Train RMSE: 5.9144
Iter: 600, Train Loss: 3.8473, Train MAPE: 0.1153, Train RMSE: 7.9904
Iter: 800, Train Loss: 2.9894, Train MAPE: 0.0759, Train RMSE: 6.0243
Iter: 1000, Train Loss: 3.1746, Train MAPE: 0.0774, Train RMSE: 6.4018
Iter: 1200, Train Loss: 3.1232, Train MAPE: 0.0900, Train RMSE: 6.5433
Iter: 1400, Train Loss: 3.5693, Train MAPE: 0.1002, Train RMSE: 7.2030
Epoch: 023, Inference Time: 7.8937 secs
Epoch: 023, Train Loss: 3.1909, Train MAPE: 0.0843, Train RMSE: 6.3063, Valid Loss: 2.9777, Valid MAPE: 0.0850, Valid RMSE: 5.7796, Training Time: 191.1500/epoch
Iter: 000, Train Loss: 3.3430, Train MAPE: 0.0892, Train RMSE: 6.4679
Iter: 200, Train Loss: 2.8066, Train MAPE: 0.0675, Train RMSE: 5.5711
Iter: 400, Train Loss: 3.3700, Train MAPE: 0.0872, Train RMSE: 6.5869
Iter: 600, Train Loss: 2.5711, Train MAPE: 0.0541, Train RMSE: 5.0107
Iter: 800, Train Loss: 2.8348, Train MAPE: 0.0667, Train RMSE: 5.6260
Iter: 1000, Train Loss: 3.2051, Train MAPE: 0.0911, Train RMSE: 6.5602
Iter: 1200, Train Loss: 2.9269, Train MAPE: 0.0705, Train RMSE: 5.6743
Iter: 1400, Train Loss: 3.2450, Train MAPE: 0.0807, Train RMSE: 6.4516
Epoch: 024, Inference Time: 7.8246 secs
Epoch: 024, Train Loss: 3.1081, Train MAPE: 0.0818, Train RMSE: 6.1761, Valid Loss: 2.9748, Valid MAPE: 0.0863, Valid RMSE: 5.7719, Training Time: 190.7930/epoch
Iter: 000, Train Loss: 3.3645, Train MAPE: 0.1052, Train RMSE: 7.0194
Iter: 200, Train Loss: 3.0151, Train MAPE: 0.0703, Train RMSE: 5.6662
Iter: 400, Train Loss: 2.6968, Train MAPE: 0.0518, Train RMSE: 4.6948
Iter: 600, Train Loss: 3.3491, Train MAPE: 0.0972, Train RMSE: 6.8471
Iter: 800, Train Loss: 2.9555, Train MAPE: 0.0759, Train RMSE: 5.8546
Iter: 1000, Train Loss: 3.5152, Train MAPE: 0.0916, Train RMSE: 6.4758
Iter: 1200, Train Loss: 2.9687, Train MAPE: 0.0698, Train RMSE: 5.5984
Iter: 1400, Train Loss: 3.0127, Train MAPE: 0.0804, Train RMSE: 6.1584
Epoch: 025, Inference Time: 7.5409 secs
Epoch: 025, Train Loss: 3.2244, Train MAPE: 0.0856, Train RMSE: 6.3720, Valid Loss: 3.0749, Valid MAPE: 0.0879, Valid RMSE: 5.8056, Training Time: 189.9061/epoch
Iter: 000, Train Loss: 3.0590, Train MAPE: 0.0798, Train RMSE: 5.8979
Iter: 200, Train Loss: 2.6945, Train MAPE: 0.0609, Train RMSE: 5.0210
Iter: 400, Train Loss: 3.0129, Train MAPE: 0.0811, Train RMSE: 5.9054
Iter: 600, Train Loss: 2.9322, Train MAPE: 0.0694, Train RMSE: 5.6470
Iter: 800, Train Loss: 3.4420, Train MAPE: 0.0975, Train RMSE: 6.8987
Iter: 1000, Train Loss: 3.2525, Train MAPE: 0.0905, Train RMSE: 6.2839
Iter: 1200, Train Loss: 3.1411, Train MAPE: 0.0780, Train RMSE: 5.7211
Iter: 1400, Train Loss: 3.2512, Train MAPE: 0.0915, Train RMSE: 6.7554
Epoch: 026, Inference Time: 7.5576 secs
Epoch: 026, Train Loss: 3.1510, Train MAPE: 0.0831, Train RMSE: 6.2495, Valid Loss: 2.9910, Valid MAPE: 0.0868, Valid RMSE: 5.8094, Training Time: 187.8944/epoch
Iter: 000, Train Loss: 3.3956, Train MAPE: 0.1198, Train RMSE: 7.5805
Iter: 200, Train Loss: 3.0404, Train MAPE: 0.0845, Train RMSE: 6.1813
Iter: 400, Train Loss: 2.8399, Train MAPE: 0.0632, Train RMSE: 5.5312
Iter: 600, Train Loss: 3.0132, Train MAPE: 0.0834, Train RMSE: 6.1143
Iter: 800, Train Loss: 3.0860, Train MAPE: 0.0864, Train RMSE: 6.1554
Iter: 1000, Train Loss: 3.4590, Train MAPE: 0.0986, Train RMSE: 6.8291
Iter: 1200, Train Loss: 3.3082, Train MAPE: 0.0904, Train RMSE: 6.4637
Iter: 1400, Train Loss: 3.1123, Train MAPE: 0.0765, Train RMSE: 6.1111
Epoch: 027, Inference Time: 7.5816 secs
Epoch: 027, Train Loss: 3.1300, Train MAPE: 0.0825, Train RMSE: 6.2102, Valid Loss: 3.0341, Valid MAPE: 0.0876, Valid RMSE: 5.8440, Training Time: 187.2415/epoch
Iter: 000, Train Loss: 3.2723, Train MAPE: 0.0857, Train RMSE: 6.5758
Iter: 200, Train Loss: 3.2314, Train MAPE: 0.0866, Train RMSE: 6.3914
Iter: 400, Train Loss: 3.4186, Train MAPE: 0.1049, Train RMSE: 7.3183
Iter: 600, Train Loss: 3.3949, Train MAPE: 0.0967, Train RMSE: 6.7420
Iter: 800, Train Loss: 3.6058, Train MAPE: 0.1181, Train RMSE: 7.0770
Iter: 1000, Train Loss: 2.9993, Train MAPE: 0.0853, Train RMSE: 6.0572
Iter: 1200, Train Loss: 3.1907, Train MAPE: 0.0873, Train RMSE: 6.4577
Iter: 1400, Train Loss: 3.3650, Train MAPE: 0.0868, Train RMSE: 6.8011
Epoch: 028, Inference Time: 7.6096 secs
Epoch: 028, Train Loss: 3.1988, Train MAPE: 0.0845, Train RMSE: 6.3225, Valid Loss: 3.0497, Valid MAPE: 0.0904, Valid RMSE: 5.8935, Training Time: 187.6963/epoch
Iter: 000, Train Loss: 3.2322, Train MAPE: 0.0830, Train RMSE: 6.0444
Iter: 200, Train Loss: 3.3343, Train MAPE: 0.1163, Train RMSE: 7.2919
Iter: 400, Train Loss: 2.9015, Train MAPE: 0.0622, Train RMSE: 5.3030
Iter: 600, Train Loss: 3.0810, Train MAPE: 0.0775, Train RMSE: 5.9598
Iter: 800, Train Loss: 3.0913, Train MAPE: 0.0775, Train RMSE: 5.8424
Iter: 1000, Train Loss: 3.5786, Train MAPE: 0.0986, Train RMSE: 7.0676
Iter: 1200, Train Loss: 2.8536, Train MAPE: 0.0637, Train RMSE: 5.8162
Iter: 1400, Train Loss: 3.0281, Train MAPE: 0.0749, Train RMSE: 5.6696
Epoch: 029, Inference Time: 7.6760 secs
Epoch: 029, Train Loss: 3.1135, Train MAPE: 0.0820, Train RMSE: 6.1906, Valid Loss: 2.9803, Valid MAPE: 0.0864, Valid RMSE: 5.7914, Training Time: 187.9301/epoch
Iter: 000, Train Loss: 2.8214, Train MAPE: 0.0614, Train RMSE: 5.3433
Iter: 200, Train Loss: 3.0281, Train MAPE: 0.0804, Train RMSE: 6.0214
Iter: 400, Train Loss: 2.5174, Train MAPE: 0.0597, Train RMSE: 5.0201
Iter: 600, Train Loss: 3.1127, Train MAPE: 0.0794, Train RMSE: 6.3137
Iter: 800, Train Loss: 3.0174, Train MAPE: 0.0748, Train RMSE: 6.0520
Iter: 1000, Train Loss: 3.1731, Train MAPE: 0.0771, Train RMSE: 6.0898
Iter: 1200, Train Loss: 2.9058, Train MAPE: 0.0647, Train RMSE: 5.8556
Iter: 1400, Train Loss: 2.8987, Train MAPE: 0.0594, Train RMSE: 5.5906
Epoch: 030, Inference Time: 7.6469 secs
Epoch: 030, Train Loss: 3.1629, Train MAPE: 0.0836, Train RMSE: 6.2707, Valid Loss: 3.0361, Valid MAPE: 0.0885, Valid RMSE: 5.9027, Training Time: 186.1812/epoch
Iter: 000, Train Loss: 2.9432, Train MAPE: 0.0610, Train RMSE: 5.3416
Iter: 200, Train Loss: 2.6778, Train MAPE: 0.0606, Train RMSE: 4.9076
Iter: 400, Train Loss: 3.4188, Train MAPE: 0.0911, Train RMSE: 6.7257
Iter: 600, Train Loss: 3.3156, Train MAPE: 0.0969, Train RMSE: 6.4739
Iter: 800, Train Loss: 3.0027, Train MAPE: 0.0724, Train RMSE: 5.9304
Iter: 1000, Train Loss: 2.5957, Train MAPE: 0.0553, Train RMSE: 4.8018
Iter: 1200, Train Loss: 3.0447, Train MAPE: 0.0876, Train RMSE: 6.4281
Iter: 1400, Train Loss: 3.1395, Train MAPE: 0.0846, Train RMSE: 6.2482
Epoch: 031, Inference Time: 7.6075 secs
Epoch: 031, Train Loss: 3.1627, Train MAPE: 0.0836, Train RMSE: 6.2738, Valid Loss: 3.0135, Valid MAPE: 0.0878, Valid RMSE: 5.8200, Training Time: 185.9557/epoch
Iter: 000, Train Loss: 2.5227, Train MAPE: 0.0576, Train RMSE: 5.2198
Iter: 200, Train Loss: 2.8681, Train MAPE: 0.0738, Train RMSE: 5.8064
Iter: 400, Train Loss: 3.2546, Train MAPE: 0.0847, Train RMSE: 6.3948
Iter: 600, Train Loss: 2.8131, Train MAPE: 0.0690, Train RMSE: 5.7223
Iter: 800, Train Loss: 3.2878, Train MAPE: 0.0874, Train RMSE: 6.7646
Iter: 1000, Train Loss: 2.8661, Train MAPE: 0.0654, Train RMSE: 5.2356
Iter: 1200, Train Loss: 2.8768, Train MAPE: 0.0604, Train RMSE: 5.2934
Iter: 1400, Train Loss: 2.7647, Train MAPE: 0.0615, Train RMSE: 5.6729
Epoch: 032, Inference Time: 7.6146 secs
Epoch: 032, Train Loss: 3.0827, Train MAPE: 0.0813, Train RMSE: 6.1495, Valid Loss: 2.9804, Valid MAPE: 0.0864, Valid RMSE: 5.7946, Training Time: 185.5676/epoch
Iter: 000, Train Loss: 3.0949, Train MAPE: 0.0834, Train RMSE: 6.0752
Iter: 200, Train Loss: 3.6042, Train MAPE: 0.0960, Train RMSE: 6.9697
Iter: 400, Train Loss: 3.1357, Train MAPE: 0.0748, Train RMSE: 6.0966
Iter: 600, Train Loss: 3.5265, Train MAPE: 0.1013, Train RMSE: 7.0817
Iter: 800, Train Loss: 3.1420, Train MAPE: 0.0809, Train RMSE: 6.3041
Iter: 1000, Train Loss: 3.5797, Train MAPE: 0.1156, Train RMSE: 7.4765
Iter: 1200, Train Loss: 3.4701, Train MAPE: 0.0969, Train RMSE: 7.0358
Iter: 1400, Train Loss: 3.0463, Train MAPE: 0.0631, Train RMSE: 5.6481
Epoch: 033, Inference Time: 7.6366 secs
Epoch: 033, Train Loss: 3.1968, Train MAPE: 0.0846, Train RMSE: 6.3241, Valid Loss: 3.0361, Valid MAPE: 0.0889, Valid RMSE: 5.8367, Training Time: 185.6092/epoch
Iter: 000, Train Loss: 3.3522, Train MAPE: 0.1097, Train RMSE: 6.9688
Iter: 200, Train Loss: 3.7878, Train MAPE: 0.1037, Train RMSE: 7.7809
Iter: 400, Train Loss: 2.9246, Train MAPE: 0.0780, Train RMSE: 6.0050
Iter: 600, Train Loss: 2.9914, Train MAPE: 0.0752, Train RMSE: 5.8495
Iter: 800, Train Loss: 2.9203, Train MAPE: 0.0777, Train RMSE: 6.1080
Iter: 1000, Train Loss: 3.1553, Train MAPE: 0.0907, Train RMSE: 6.4029
Iter: 1200, Train Loss: 3.0469, Train MAPE: 0.0803, Train RMSE: 6.0430
Iter: 1400, Train Loss: 2.9406, Train MAPE: 0.0795, Train RMSE: 6.4597
Epoch: 034, Inference Time: 7.6388 secs
Epoch: 034, Train Loss: 3.1265, Train MAPE: 0.0825, Train RMSE: 6.2144, Valid Loss: 2.9872, Valid MAPE: 0.0867, Valid RMSE: 5.7993, Training Time: 186.0800/epoch
Iter: 000, Train Loss: 3.1621, Train MAPE: 0.0987, Train RMSE: 6.5338
Iter: 200, Train Loss: 3.0690, Train MAPE: 0.0913, Train RMSE: 6.4653
Iter: 400, Train Loss: 3.0285, Train MAPE: 0.0795, Train RMSE: 5.9470
Iter: 600, Train Loss: 3.0149, Train MAPE: 0.0686, Train RMSE: 5.5261
Iter: 800, Train Loss: 3.0080, Train MAPE: 0.0780, Train RMSE: 6.0198
Iter: 1000, Train Loss: 2.8365, Train MAPE: 0.0752, Train RMSE: 5.8602
Iter: 1200, Train Loss: 3.1653, Train MAPE: 0.0901, Train RMSE: 6.2914
Iter: 1400, Train Loss: 2.9214, Train MAPE: 0.0809, Train RMSE: 6.3070
Epoch: 035, Inference Time: 7.5762 secs
Epoch: 035, Train Loss: 3.1054, Train MAPE: 0.0819, Train RMSE: 6.1795, Valid Loss: 3.0530, Valid MAPE: 0.0870, Valid RMSE: 5.8527, Training Time: 185.8586/epoch
Iter: 000, Train Loss: 3.6675, Train MAPE: 0.1099, Train RMSE: 7.3413
Iter: 200, Train Loss: 3.0354, Train MAPE: 0.0677, Train RMSE: 5.4558
Iter: 400, Train Loss: 3.2450, Train MAPE: 0.0801, Train RMSE: 6.2279
Iter: 600, Train Loss: 3.2038, Train MAPE: 0.0900, Train RMSE: 6.2409
Iter: 800, Train Loss: 3.0240, Train MAPE: 0.0774, Train RMSE: 5.8600
Iter: 1000, Train Loss: 2.9744, Train MAPE: 0.0686, Train RMSE: 6.2056
Iter: 1200, Train Loss: 3.1432, Train MAPE: 0.0906, Train RMSE: 6.5542
Iter: 1400, Train Loss: 3.5893, Train MAPE: 0.1184, Train RMSE: 7.0786
Epoch: 036, Inference Time: 7.6455 secs
Epoch: 036, Train Loss: 3.1775, Train MAPE: 0.0840, Train RMSE: 6.2958, Valid Loss: 3.0048, Valid MAPE: 0.0873, Valid RMSE: 5.8318, Training Time: 185.7981/epoch
Iter: 000, Train Loss: 4.1558, Train MAPE: 0.1525, Train RMSE: 8.4070
Iter: 200, Train Loss: 2.8753, Train MAPE: 0.0721, Train RMSE: 5.9755
Iter: 400, Train Loss: 3.2711, Train MAPE: 0.0904, Train RMSE: 6.4678
Iter: 600, Train Loss: 3.6795, Train MAPE: 0.1282, Train RMSE: 7.6366
Iter: 800, Train Loss: 3.2573, Train MAPE: 0.0852, Train RMSE: 6.2740
Iter: 1000, Train Loss: 3.1652, Train MAPE: 0.0909, Train RMSE: 6.6810
Iter: 1200, Train Loss: 2.5959, Train MAPE: 0.0546, Train RMSE: 4.8261
Iter: 1400, Train Loss: 3.0383, Train MAPE: 0.0703, Train RMSE: 5.6032
Epoch: 037, Inference Time: 7.5713 secs
Epoch: 037, Train Loss: 3.0911, Train MAPE: 0.0816, Train RMSE: 6.1652, Valid Loss: 2.9906, Valid MAPE: 0.0869, Valid RMSE: 5.8163, Training Time: 185.7781/epoch
Iter: 000, Train Loss: 2.9771, Train MAPE: 0.0904, Train RMSE: 6.2688
Iter: 200, Train Loss: 3.2470, Train MAPE: 0.0917, Train RMSE: 6.5446
Iter: 400, Train Loss: 3.0469, Train MAPE: 0.0722, Train RMSE: 5.5204
Iter: 600, Train Loss: 2.6923, Train MAPE: 0.0678, Train RMSE: 5.3771
Iter: 800, Train Loss: 3.1686, Train MAPE: 0.0902, Train RMSE: 6.6834
Iter: 1000, Train Loss: 3.6385, Train MAPE: 0.1196, Train RMSE: 7.9107
Iter: 1200, Train Loss: 3.1841, Train MAPE: 0.0796, Train RMSE: 5.9958
Iter: 1400, Train Loss: 3.3495, Train MAPE: 0.0860, Train RMSE: 6.9553
Epoch: 038, Inference Time: 7.5846 secs
Epoch: 038, Train Loss: 3.1410, Train MAPE: 0.0831, Train RMSE: 6.2419, Valid Loss: 3.0898, Valid MAPE: 0.0887, Valid RMSE: 5.8416, Training Time: 185.3236/epoch
Iter: 000, Train Loss: 3.5613, Train MAPE: 0.1005, Train RMSE: 7.2942
Iter: 200, Train Loss: 2.7853, Train MAPE: 0.0590, Train RMSE: 5.1853
Iter: 400, Train Loss: 3.0959, Train MAPE: 0.0759, Train RMSE: 6.0333
Iter: 600, Train Loss: 3.3716, Train MAPE: 0.0993, Train RMSE: 7.1209
Iter: 800, Train Loss: 3.4626, Train MAPE: 0.0926, Train RMSE: 6.6887
Iter: 1000, Train Loss: 3.3478, Train MAPE: 0.1091, Train RMSE: 7.0479
Iter: 1200, Train Loss: 3.0288, Train MAPE: 0.0664, Train RMSE: 5.2062
Iter: 1400, Train Loss: 2.4501, Train MAPE: 0.0508, Train RMSE: 5.0231
Epoch: 039, Inference Time: 7.5734 secs
Epoch: 039, Train Loss: 3.1426, Train MAPE: 0.0830, Train RMSE: 6.2417, Valid Loss: 3.0117, Valid MAPE: 0.0882, Valid RMSE: 5.8182, Training Time: 185.7462/epoch
Iter: 000, Train Loss: 2.9465, Train MAPE: 0.0740, Train RMSE: 6.0828
Iter: 200, Train Loss: 3.2774, Train MAPE: 0.0949, Train RMSE: 6.5842
Iter: 400, Train Loss: 3.4969, Train MAPE: 0.1097, Train RMSE: 7.4260
Iter: 600, Train Loss: 3.6501, Train MAPE: 0.1061, Train RMSE: 7.2592
Iter: 800, Train Loss: 2.9212, Train MAPE: 0.0717, Train RMSE: 5.7908
Iter: 1000, Train Loss: 2.8247, Train MAPE: 0.0748, Train RMSE: 5.4179
Iter: 1200, Train Loss: 3.0869, Train MAPE: 0.0855, Train RMSE: 6.1954
Iter: 1400, Train Loss: 3.3112, Train MAPE: 0.0972, Train RMSE: 6.7477
Epoch: 040, Inference Time: 7.5386 secs
Epoch: 040, Train Loss: 3.0638, Train MAPE: 0.0808, Train RMSE: 6.1238, Valid Loss: 2.9870, Valid MAPE: 0.0866, Valid RMSE: 5.8136, Training Time: 186.1004/epoch
Early Termination!
Average Training Time: 189.2072 secs/epoch
Average Inference Time: 7.7535 secs
Training finished
The valid loss on best model is 2.9748
Evaluate best model on test data for horizon 1, Test MAE: 2.4027, Test MAPE: 0.0608, Test RMSE: 4.3709
Evaluate best model on test data for horizon 2, Test MAE: 2.6633, Test MAPE: 0.0696, Test RMSE: 5.1697
Evaluate best model on test data for horizon 3, Test MAE: 2.8549, Test MAPE: 0.0766, Test RMSE: 5.7345
Evaluate best model on test data for horizon 4, Test MAE: 3.0130, Test MAPE: 0.0829, Test RMSE: 6.1764
Evaluate best model on test data for horizon 5, Test MAE: 3.1466, Test MAPE: 0.0884, Test RMSE: 6.5392
Evaluate best model on test data for horizon 6, Test MAE: 3.2659, Test MAPE: 0.0935, Test RMSE: 6.8510
Evaluate best model on test data for horizon 7, Test MAE: 3.3701, Test MAPE: 0.0980, Test RMSE: 7.1191
Evaluate best model on test data for horizon 8, Test MAE: 3.4618, Test MAPE: 0.1019, Test RMSE: 7.3454
Evaluate best model on test data for horizon 9, Test MAE: 3.5433, Test MAPE: 0.1053, Test RMSE: 7.5382
Evaluate best model on test data for horizon 10, Test MAE: 3.6212, Test MAPE: 0.1085, Test RMSE: 7.7152
Evaluate best model on test data for horizon 11, Test MAE: 3.6942, Test MAPE: 0.1113, Test RMSE: 7.8761
Evaluate best model on test data for horizon 12, Test MAE: 3.7665, Test MAPE: 0.1141, Test RMSE: 8.0307
On average over 12 horizons, Test MAE: 3.2336, Test MAPE: 0.0926, Test RMSE: 6.7055
Total time spent: 7924.1520
HyunWookL commented 1 month ago

Thank you for your great question.

TL;DR: It comes from the unstable route training during the initial phase. You can set warmup_epoch to stabilize/reproduce results.

Below are detailed explanation.

It may be caused by suboptimal routing due to the unstable initial route learning. You can see each expert's performance by running test.py, confirming that they are incorrectly routed. To fix such You may set the small warmup_epoch (e.g., 5 epochs) to stabilize them.

Actually, the current training sequence of gating networks is suboptimal and needs improvement; Which means, it may mislead the input to the second- or third-best expert. It is our future research goal and we try our best to improve such suboptimal route selection.

Further questions are welcome

Jimmy-7664 commented 1 month ago

Thank you for your prompt reply. I modified the warm epoch to 5 and the results did get better but it doesn't look like it beat the baseline in the original article. is there something else I need to tweak? Here is the new log. I omit the first few epochs log to make it not so long.

Epoch: 048, Inference Time: 7.5892 secs
Epoch: 048, Train Loss: 3.0352, Train MAPE: 0.0679, Train RMSE: 5.1720, Valid Loss: 2.6798, Valid MAPE: 0.0738, Valid RMSE: 5.1405, Training Time: 185.3764/epoch
Early Termination!
Average Training Time: 187.4990 secs/epoch
Average Inference Time: 7.7179 secs
Training finished
The valid loss on best model is 2.6721
Evaluate best model on test data for horizon 1, Test MAE: 2.2504, Test MAPE: 0.0545, Test RMSE: 3.9607
Evaluate best model on test data for horizon 2, Test MAE: 2.4852, Test MAPE: 0.0621, Test RMSE: 4.6657
Evaluate best model on test data for horizon 3, Test MAE: 2.6497, Test MAPE: 0.0680, Test RMSE: 5.1444
Evaluate best model on test data for horizon 4, Test MAE: 2.7825, Test MAPE: 0.0731, Test RMSE: 5.5305
Evaluate best model on test data for horizon 5, Test MAE: 2.8924, Test MAPE: 0.0776, Test RMSE: 5.8434
Evaluate best model on test data for horizon 6, Test MAE: 2.9856, Test MAPE: 0.0815, Test RMSE: 6.1093
Evaluate best model on test data for horizon 7, Test MAE: 3.0703, Test MAPE: 0.0850, Test RMSE: 6.3443
Evaluate best model on test data for horizon 8, Test MAE: 3.1467, Test MAPE: 0.0881, Test RMSE: 6.5505
Evaluate best model on test data for horizon 9, Test MAE: 3.2150, Test MAPE: 0.0910, Test RMSE: 6.7311
Evaluate best model on test data for horizon 10, Test MAE: 3.2779, Test MAPE: 0.0936, Test RMSE: 6.8935
Evaluate best model on test data for horizon 11, Test MAE: 3.3353, Test MAPE: 0.0959, Test RMSE: 7.0382
Evaluate best model on test data for horizon 12, Test MAE: 3.3931, Test MAPE: 0.0983, Test RMSE: 7.1796
On average over 12 horizons, Test MAE: 2.9570, Test MAPE: 0.0807, Test RMSE: 5.9993
Total time spent: 9417.1009

One more question, I think after training TESTAM uses the same expert for all inputs instead of dynamically choosing different expert, am I correct in my understanding?

Looking forward to your reply

randomforest1111 commented 1 month ago

We encountered the same problem during the replication process, using 5 warmup epochs. The final MAE in the third step of PEMS-BAY was 1.385, the MAE in the sixth step was 1.687, and the MAE in the twelfth step was 1.952. This is significantly different from the results reported in the paper and does not exceed many baselines. Do we need to adjust the hyperparameters to achieve the results in the paper?

HyunWookL commented 1 month ago

We noticed that there may exist improper routing, which chooses only one expert regardless of regression error. We are now testing the load balancing loss function to reduce such inferiority.

Even worse, in the case of the PEMS-BAY, TESTAM sometimes selects the "worst" expert :( In that case, the MAE could be much larger than the reported one.

We'll try our best to fix the issue and after the test, we'll update the code accordingly.

HyunWookL commented 3 weeks ago

@Jimmy-7664 @randomforest1111 Thank you for your continuing interest in our paper!

The problem comes from the current version of Python and PyTorch blocked index-based in-place operation. We now revised our pseudo-label generation process accordingly. For your information, I've left notes in README.md file.

Even though we revised the pseudo-label generation and avoided selecting "improper experts," we still have some issues, such as routing may be biased toward one expert. We provide some functions that may be helpful for better routing, such as load balancing loss function or uncertainty measurements.

We still trying to improve our model, so keep touching with us!

Thank you again for your great attention and interest in our paper and I hope this change resolved your problems.