columncolab / EMC2

Earth Model Column Collabratory
BSD 3-Clause "New" or "Revised" License
10 stars 7 forks source link

Error processing large datasets (long simulations), which should generally be managable #32

Closed isilber closed 3 years ago

isilber commented 4 years ago

The sub-column generator becomes "stuck"and continuously producing the following error: Traceback (most recent call last): File "/home/meteo/ixs34/.conda/envs/emc2/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/meteo/ixs34/.conda/envs/emc2/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/meteo/ixs34/.conda/envs/emc2/lib/python3.7/multiprocessing/pool.py", line 110, in worker task = get() File "/home/meteo/ixs34/.conda/envs/emc2/lib/python3.7/multiprocessing/queues.py", line 354, in get return _ForkingPickler.loads(res) _pickle.UnpicklingError: invalid load key, '\xff'.

In the case I tried to process with EMC2, the model output file had the time dimension consisting of 17,500 samples (requested 10 subcolumns). A quick calculation shows that the output field should generally be manageable (~84 MB per subcolumn x level x time dimension double field). I suspect that an internal option for running the parallel processing in chunks might do the trick.

isilber commented 3 years ago

"Chunking" addressed this issue