Closed abdulfatir closed 1 year ago
The following code checks if test
is actually a part of train
:
train = train.reshape(-1, 128)
test = test.reshape(-1, 128)
# 5471 = 65 * 192 - 7009
np.allclose(train[5471 : 5471 + len(test)], test) # <--- True
Hello @abdulfatir , I thank you for your observation, unfortunatelly, you are right, the truth is that I wasn't aware about the data leakage coming from the respetive 'train' and 'test' groupers. I followed these GluonTS implementations and that made me believe that was correct: https://github.com/jeffrey82221/gluonts_fund_price_forecast/blob/fed7c484c4dba663201f9cf96aa86ca98119b54c/reference/pytorch_ts_examples/multivariate.py and https://github.com/yantijin/ScoreGradPred/tree/8fdce6349f0d34c651837b6b5e3c4024f223f360.
I did a new training with a new data pre-processing which only uses data from one grouper, and also added extra channels to match the channel split which I left out for evaluation, so we use all channels for training (I will update the jupyter notebook in this repo soon). We achieved 5.03e2 ± 1.06e1 after 200,000 iterations. Which fortunatelly is still significat. Again, thank you for your observation and apologies for the inconvenience.
Thanks for the update, @juanlopezcode! Just to clarify, the repos that you referred to appear to be doing things correctly. Anyway, it's good to see the new results. Thank you for sharing them. I would also suggest that you update the paper accordingly and also verify if such data splitting issues exist in other experiments. Cheers!
Hi!
Thank you for releasing your code!
I have some questions regarding data splitting for the solar dataset as implemented in this notebook. Here's the relevant portion of the code with some comments of mine.
Based on my comments in the code above, I have the following questions:
Looking forward to your clarifications.