the data in PCCI_20082022_IZA seems to be broken

CharmsGraker commented 11 months ago

Thanks for great work! I got an IndexError after I moved PCCI_20082022_IZA to the training split to be same as the paper. I was wondering the data in PCCI_20082022_IZA whether it is broken. Is that correct or I need to redownload the section of data in PCCI_20082022_IZA ? (I'm asure this problem is caused by PCCI_20082022_IZA, since I removed it from training split and everything goes fine)

Here is the detail of my configuration in forecast_datamodule.yaml stations: train: ["PCCI_20082022_IZA", "PCCI_20082022_CNR", "PCCI_20082022_PAL"] val: ["PCCI_20082022_PAY"] test: ["PCCI_20082022_CAB", "PCCI_20082022_TAM"]

danassou commented 11 months ago

Hi, thanks for your interest!

There shouldn't theoretically be a problem with the PCCI_20082022_IZA data since we also used it for training in the paper results; could you please provide the details of the IndexError error you get (setting HYDRA_FULL_ERROR=1 if not already)? Also, the full configuration could be useful, in case you are not using an experiment we are providing already.

CharmsGraker commented 11 months ago

Thanks for reply! I found it is caused by the cached channel index in TSDataset the get_channel_ids function in TSDataset just like below:

if self._ts_channel_ids is None:
        if self.ts_channels is not None:
            self._ts_channel_ids = [
                i
                for i, k in enumerate(timeseries_tensor.info["timeseries_channels"])
                for c in self.ts_channels
                if c == k
            ]
        else:
            self._ts_channel_ids = [
                i
                for i, k in enumerate(timeseries_tensor.info["timeseries_channels"])
            ]
        self._ts_channel_ids = sorted(self._ts_channel_ids)

return self._ts_channel_ids

this code seems to cache channal index in previous batch, however, the indices of expected self.ts_channels features are varied from different station. So my IndexError always comes when datas from heterogenous are adjacent. Besides, as we specified the interested channel name in self.ts_channels and gather them sequentially, why using sorted to make their indices ordered? After annotating the outer if and the sorted statement, everything goes well again.

CharmsGraker commented 11 months ago

Sorry to trouble you again. To utilize context image correctly, I have a question about the EUMETSAT scanned data:

Are six stations always kept with absolute pixel position of image in each context frame?
Is it possible there may be some rotation, distortion (due to different view angel) or offset in some context frames? Thank u!

jaggbow commented 11 months ago

The absolute pixel positions of the stations are fixed and do not change. The context channels are taken within a fixed spatial window that doesn't change either (the satellite would be geostationary, so that it sees the same coordinates all the time, no offset).

To answer your previous question though, we thank you first for pointing out that problem which is a potential bug in the implementation. We're checking if that bug affects the training of the stations we provided in the paper, otherwise, we will fix it and remove the lazy loading.

Thank you again for pointing out the bug!

jaggbow commented 11 months ago

Hi,

I just checked and I can confirm that this issue didn't affect the training (and paper results), since the stations that we used had the same channels in the same order.

CharmsGraker commented 11 months ago

Hi,

I just checked and I can confirm that this issue didn't affect the training (and paper results), since the stations that we used had the same channels in the same order.

Appreciation for your examination and explanation!

The default yamls file in shared code, e.g. forecast_datamodule.yaml, ts_context.yaml, ts_datamodule.yaml, are little different from one in paper.
So if your train split is based on the below pseudo configuration according to your paper and keep the initial code in repo, the cached channel would lead model to crash during training sooner or later as I issued. Was metrics reported based on the shared file of this repo, e.g. forecast_datamodule.yaml for baseline, ts_context.yaml for cross_vivit? Sorry to point out this, as it could make a fair comparison for later studies.

the pseudo configuration below is mainly mentioned in the paper (top of Page 8):

stations: 
      train: ["PCCI_20082022_IZA", "PCCI_20082022_CNR", "PCCI_20082022_PAL"] 
      val: ["PCCI_20082022_PAY"] 
      test: ["PCCI_20082022_CAB", "PCCI_20082022_TAM"]

the train split above(IZA, CNR, PAL) only covers 2008-2016 period, and the observations of station PAY on 2017-2019 period are used for validation.
For testing, station CAB is evaluated on 2020-2022 period, the TAM station is evaluated on 2017-2019 period instead.

jaggbow commented 11 months ago

Sorry, the config variables for the datamodules are not up to date and are not the one we used in the paper indeed, so I'll push the correct ones ! I just remembered that we override the datamodule config in the command line, so we forgot to update that accordingly.

I have to double check with the other member of the team to make sure everything was running smoothly. I remember running into this caching problem at some point and removing it, but we'll double check everything else.

gitbooo / CrossViVit

the data in PCCI_20082022_IZA seems to be broken #7