gitbooo / CrossViVit

This repository contains code for the paper "Improving day-ahead Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context"
https://arxiv.org/abs/2306.01112
MIT License
68 stars 5 forks source link

the data in PCCI_20082022_IZA seems to be broken #7

Open CharmsGraker opened 11 months ago

CharmsGraker commented 11 months ago

Thanks for great work! I got an IndexError after I moved PCCI_20082022_IZA to the training split to be same as the paper. I was wondering the data in PCCI_20082022_IZA whether it is broken. Is that correct or I need to redownload the section of data in PCCI_20082022_IZA ? (I'm asure this problem is caused by PCCI_20082022_IZA, since I removed it from training split and everything goes fine)

Here is the detail of my configuration in forecast_datamodule.yaml stations: train: ["PCCI_20082022_IZA", "PCCI_20082022_CNR", "PCCI_20082022_PAL"] val: ["PCCI_20082022_PAY"] test: ["PCCI_20082022_CAB", "PCCI_20082022_TAM"]

danassou commented 11 months ago

Hi, thanks for your interest!

There shouldn't theoretically be a problem with the PCCI_20082022_IZA data since we also used it for training in the paper results; could you please provide the details of the IndexError error you get (setting HYDRA_FULL_ERROR=1 if not already)? Also, the full configuration could be useful, in case you are not using an experiment we are providing already.

CharmsGraker commented 11 months ago

Thanks for reply! I found it is caused by the cached channel index in TSDataset the get_channel_ids function in TSDataset just like below:

if self._ts_channel_ids is None:
        if self.ts_channels is not None:
            self._ts_channel_ids = [
                i
                for i, k in enumerate(timeseries_tensor.info["timeseries_channels"])
                for c in self.ts_channels
                if c == k
            ]
        else:
            self._ts_channel_ids = [
                i
                for i, k in enumerate(timeseries_tensor.info["timeseries_channels"])
            ]
        self._ts_channel_ids = sorted(self._ts_channel_ids)

return self._ts_channel_ids

this code seems to cache channal index in previous batch, however, the indices of expected self.ts_channels features are varied from different station. So my IndexError always comes when datas from heterogenous are adjacent. Besides, as we specified the interested channel name in self.ts_channels and gather them sequentially, why using sorted to make their indices ordered? After annotating the outer if and the sorted statement, everything goes well again.

CharmsGraker commented 11 months ago

Sorry to trouble you again. To utilize context image correctly, I have a question about the EUMETSAT scanned data:

jaggbow commented 11 months ago

The absolute pixel positions of the stations are fixed and do not change. The context channels are taken within a fixed spatial window that doesn't change either (the satellite would be geostationary, so that it sees the same coordinates all the time, no offset).

To answer your previous question though, we thank you first for pointing out that problem which is a potential bug in the implementation. We're checking if that bug affects the training of the stations we provided in the paper, otherwise, we will fix it and remove the lazy loading.

Thank you again for pointing out the bug!

jaggbow commented 11 months ago

Hi,

I just checked and I can confirm that this issue didn't affect the training (and paper results), since the stations that we used had the same channels in the same order.

CharmsGraker commented 11 months ago

Hi,

I just checked and I can confirm that this issue didn't affect the training (and paper results), since the stations that we used had the same channels in the same order.

Appreciation for your examination and explanation!

the pseudo configuration below is mainly mentioned in the paper (top of Page 8):

stations: 
      train: ["PCCI_20082022_IZA", "PCCI_20082022_CNR", "PCCI_20082022_PAL"] 
      val: ["PCCI_20082022_PAY"] 
      test: ["PCCI_20082022_CAB", "PCCI_20082022_TAM"]
jaggbow commented 11 months ago

Sorry, the config variables for the datamodules are not up to date and are not the one we used in the paper indeed, so I'll push the correct ones ! I just remembered that we override the datamodule config in the command line, so we forgot to update that accordingly.

I have to double check with the other member of the team to make sure everything was running smoothly. I remember running into this caching problem at some point and removing it, but we'll double check everything else.