Which is a better approach: forecast satellite image pixels first or direct forecasting (in this paper)?

740402059 commented 1 year ago

I have a question about using information from satellite cloud images. Previous studies mostly involved establishing spatiotemporal forecast models to forecast satellite images first, and then using the forecasted results of corresponding pixel points in satellite cloud images as part of the features, along with time series data, for another round of forecast. This staged approach is easy to comprehend and yields decent results in my tests.

In this work, you directly built an end-to-end model for the fusion of the two types of data. I believe this approach could better utilize information from other pixel points in the satellite images . However, I understand that training the model is more challenging compared to the previous method.

Therefore, I would like to inquire if you have compared the method of forecasting satellite cloud images first and whether direct forecasting is superior to staged forecasting. Additionally, how do you perceive these two methods?

jaggbow commented 1 year ago

In the initial stages of the project, we started by forecasting the cloud albedo only. So it was a spatio-temporal prediction task. The problem with the albedo, is that it's undefined during the night, so we can't utilize a whole day of information to forecast the next day. Even if we had the forecasted information, a potential solution would be to have the solar irradiance for all pixels forecasted into the future and then perform some form of interpolation (either fixed or learned), to get the solar irradiance at a particular station and use that as additional information along with time-series information to further correct for the true solar irradiance.

It's a potential approach that we didn't explore further because we wanted a unified framework to do everything end-to-end and to incorporate physical inductive biases into the architecture, that's why we opted for cross-attention.

So the answer to your question is no, we didn't compare both approaches which we would leave for a future work. The main intent of our paper was to show that time-series models are insufficient when context is needed. And unless we do a systematic comparison, I can't say which one would be superior. Note, that for this particular paper, we didn't dive deeper into which approach would be best suited for mixing modalities and in future work, we're planning on studying each design choice and assessing its impact.

740402059 commented 1 year ago

Thank you very much for your response! Cross attention is indeed a very good fusion method. Investigating which fusion method is better is also something I will research in the future.

gitbooo / CrossViVit

Which is a better approach: forecast satellite image pixels first or direct forecasting (in this paper)? #9