caracal-pipeline / stimela

Stimela 2.0
GNU General Public License v2.0
5 stars 4 forks source link

Separator is not found error after running a step #137

Closed landmanbester closed 1 year ago

landmanbester commented 1 year ago

After running one of the pfb-clean workers through stimela I get the following error

# INFO      11:39:17 - GRID               | Computing image space data products
# INFO      11:42:33 - GRID               | Writing fits
# INFO      11:42:37 - GRID               | All done here.
Traceback (most recent call last):
  File "/usr/lib/python3.8/asyncio/streams.py", line 540, in readline
    line = await self.readuntil(sep)
  File "/usr/lib/python3.8/asyncio/streams.py", line 618, in readuntil
    raise exceptions.LimitOverrunError(
asyncio.exceptions.LimitOverrunError: Separator is not found, and chunk exceed the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bester/.venv/skat/lib/python3.8/site-packages/stimela/utils/xrun_asyncio.py", line 118, in xrun
    results = loop.run_until_complete(job)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/bester/.venv/skat/lib/python3.8/site-packages/stimela/utils/xrun_asyncio.py", line 99, in stream_reader
    line = await stream.readline()
  File "/usr/lib/python3.8/asyncio/streams.py", line 549, in readline
    raise ValueError(e.args[0])
ValueError: Separator is not found, and chunk exceed the limit
2023-03-08 11:42:37 STIMELA.desol.grid_sol ERROR: grid threw exception: Separator is not found, and chunk exceed the limit after 0:03:26'

The top three lines is from the worker's log. When it prints 'All done here' that means the task completed and it appears that the output products are as expected. Anyone got any ideas on what could be going wrong here?

o-smirnov commented 1 year ago

I think I already tripped over this and I have a fix. I'll point you to a branch momentarily...

o-smirnov commented 1 year ago

Try issue-137 branch. Looks like the default buffer settings in asyncio are unreasonably small.

landmanbester commented 1 year ago

Thanks, I think that's done it. Do you have an idea of what could be causing this in the first place? Could it be because I am invoking the dask progress bar?

o-smirnov commented 1 year ago

Very likely. Those progress bars produce a diarrhea of console output, which could have jammed up the buffers easily.

This is why both CubiCal and DDFacet had "--Log-Boring" options....

SpheMakh commented 1 year ago

Did the fix work?

o-smirnov commented 1 year ago

As far as I know yes.