NikosAlexandris / rekx

rekx (or reKX from XKer) : Kerchunk after Xarray
European Union Public License 1.2
4 stars 1 forks source link

Error in combining daily to yearly references, processes ran via GNU Parallel #1

Open NikosAlexandris opened 1 year ago

NikosAlexandris commented 1 year ago

An error while combining daily to yearly reference files :

❯ seq 1999 2022 | xargs -n 1 |parallel --jobs 2 --joblog "parallel.kerchunk.combine.log" pvgis-prototype series kerchunk combine /project/home/p200206/data/SID/reference_json /project/home/p2
00206/data/sarah3_sid_reference_{}.json --pattern SIDin{}*.json
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/tier2/users/u101014/pvgis-prototype-clone/pvgisprototype/api/series/ker │
│ chunk.py:193 in combine_kerchunk_references                                  │
│                                                                              │
│   190 │   │   │   concat_dims=['time'],                                      │
│   191 │   │   │   identical_dims=['lat', 'lon'],                             │
│   192 │   │   )                                                              │
│ ❱ 193 │   │   multifile_kerchunk = mzz.translate()                           │
│   194 │   │                                                                  │
│   195 │   │   combined_reference_filename = Path(combined_reference)         │
│   196 │   │   local_fs = fsspec.filesystem('file')                           │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │   combined_reference = PosixPath('/project/home/p200206/data/sarah3_sid… │ │
│ │                 mode = <DisplayMode.SILENT: 0>                           │ │
│ │      MultiZarrToZarr = <class 'kerchunk.combine.MultiZarrToZarr'>        │ │
│ │                  mzz = <kerchunk.combine.MultiZarrToZarr object at       │ │
│ │                        0x150cc3f54730>                                   │ │
│ │              pattern = 'SIDin2022*.json'                                 │ │
│ │ reference_file_paths = []                                                │ │
│ │     source_directory = PosixPath('/project/home/p200206/data/SID/refere… │ │
│ │              verbose = 0                                                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /mnt/tier2/users/u101014/pvgis-prototype-clone/.pvgis-prototype_virtual_envi │
│ ronment/lib/python3.10/site-packages/kerchunk/combine.py:496 in translate    │
│                                                                              │
│   493 │   │   file using ujson and fsspec instead of being returned.         │
│   494 │   │   """                                                            │
│   495 │   │   if 1 not in self.done:                                         │
│ ❱ 496 │   │   │   self.first_pass()                                          │
│   497 │   │   if 2 not in self.done:                                         │
│   498 │   │   │   self.store_coords()                                        │
│   499 │   │   if 3 not in self.done:                                         │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        filename = None                                                   │ │
│ │            self = <kerchunk.combine.MultiZarrToZarr object at            │ │
│ │                   0x150cc3f54730>                                        │ │
│ │ storage_options = None                                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /mnt/tier2/users/u101014/pvgis-prototype-clone/.pvgis-prototype_virtual_envi │
│ ronment/lib/python3.10/site-packages/kerchunk/combine.py:248 in first_pass   │
│                                                                              │
│   245 │   │   """Accumulate the set of concat coords values across all input │
│   246 │   │                                                                  │
│   247 │   │   coos = {c: set() for c in self.coo_map}                        │
│ ❱ 248 │   │   for i, fs in enumerate(self.fss):                              │
│   249 │   │   │   if self.preprocess:                                        │
│   250 │   │   │   │   self.preprocess(fs.references)                         │
│   251 │   │   │   │   # reset this to force references to update             │
│                                                                              │
│ ╭────────────────────────────── locals ──────────────────────────────╮       │
│ │ coos = {'time': set()}                                             │       │
│ │ self = <kerchunk.combine.MultiZarrToZarr object at 0x150cc3f54730> │       │
│ ╰────────────────────────────────────────────────────────────────────╯       │
│                                                                              │
│ /mnt/tier2/users/u101014/pvgis-prototype-clone/.pvgis-prototype_virtual_envi │
│ ronment/lib/python3.10/site-packages/kerchunk/combine.py:156 in fss          │
│                                                                              │
│   153 │   │   │   if self._indicts is not None:                              │
│   154 │   │   │   │   fo_list = self._indicts                                │
│   155 │   │   │   │   self._paths = self.path                                │
│ ❱ 156 │   │   │   elif isinstance(self.path[0], collections.abc.Mapping):    │
│   157 │   │   │   │   fo_list = self.path                                    │
│   158 │   │   │   │   self._paths = []                                       │
│   159 │   │   │   │   for path in self.path:                                 │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ collections = <module 'collections' from                                 │ │
│ │               '/apps/USE/easybuild/release/2022.1/software/Python/3.10.… │ │
│ │        self = <kerchunk.combine.MultiZarrToZarr object at                │ │
│ │               0x150cc3f54730>                                            │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range
NikosAlexandris commented 1 year ago

Bit more on this issue

❯ ls -l *2022*.json |wc -l
19

whereas another year, successfully combined

❯ ls -l *2021*.json |wc -l
385

I guess something went wrong in the first referencing process.