Fluctuating and Non-decreasing Loss during CrossViVit Training on my own dataset

740402059 commented 8 months ago

I really appreciate your outstanding work! The work you have done on the fusion of image and time-series modalities is exactly the direction I have been researching recently. Therefore, I tested your model on my own dataset and compared it with some models I have used before, but I found that CrossViVit did not perform as well as expected. I would appreciate some advice on how to improve the forecasting performance of the model.

Similar to the dataset used in your work, my dataset consists of 15-minute satellite images with only one channel (image size = *9696) and corresponding photovoltaic power. However, the difference is that the time span of the data I used is only 2.8 years, and I tested it on the last 5 months. The model takes in the past 4 hours of data (step=16) and predicts the future 4 hours** of photovoltaic power (step=16).

I trained the model using the default parameters of CrossViVit as in your experiments, with the following differences:

Optical flow was not used.
Loss criterion = nn.MSE (the paper used L1 loss?).
AdamW optimizor with a learning rate of 0.001.
The batch_size was 16.

During the experiments, the train loss and valid loss kept fluctuating and did not decrease. I am not sure what the issue might be. Below are the train loss and valid loss from the wandb logs. I compared them with Perceiver-RNN, which is a model used in the OCF project: https://github.com/openclimatefix/predict_pv_yield (experiment/003*.py)

train_loss: https://api.wandb.ai/links/740402059/uji2orxi

train_step_loss: https://api.wandb.ai/links/740402059/lwz71qqh

valid_loss: https://api.wandb.ai/links/740402059/k7a2abvx

I believe that the cross-attention used in CrossViVit would perform better compared to the concatenation used in Perceiver-RNN, but I am not sure what might be causing the loss to not decrease.

gitbooo commented 8 months ago

Thank you for sharing your experiences with CrossViVit on your dataset. I strongly recommend utilizing a hyperparameter optimization framework to fine-tune the model parameters. Our repository recommends using the Orion library for hyperparameter optimization (Check this config file). Orion can systematically explore the hyperparameter space and identify optimal configurations for your specific dataset.

740402059 commented 8 months ago

Thank you very much for your advice, I will test further and thank you again for your work!

meteoDaniel commented 4 months ago

Using PV Output brings in much more complexity.

All GHI Timeseries just differ in characteristic by their Location.

Besides the Location, PV Power Output timeseries are mainly different because of orientation and tilt of the pv Systems.

This might be confusing for the model.

jaggbow commented 3 months ago

It would be indeed great to test using PV output, it's definitely more challenging and might as well require capture the variability using a generative model.

gitbooo / CrossViVit

Fluctuating and Non-decreasing Loss during CrossViVit Training on my own dataset #8