cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
335 stars 94 forks source link

window_resize_rewalk: traceback #6325

Open oliver-sanders opened 3 months ago

oliver-sanders commented 3 months ago

Spotted in the wild:

INFO - Command "set_graph_window_extent" received.    
    set_graph_window_extent(n_edge_distance=2)   
CRITICAL - An uncaught error caused Cylc to shut down.    
    If you think this was an issue in Cylc, please report the following traceback to the developers.    
    https://github.com/cylc/cylc-flow/issues/new?assignees=&labels=bug&template=bug.md&title=;    
ERROR - 'bool' object has no attribute 'flow_nums'    
    Traceback (most recent call last):    
      File "cylc/flow/scheduler.py", line 652, in run_scheduler    
        await self._main_loop()    
      File "cylc/flow/scheduler.py", line 1557, in _main_loop    
        await self.update_data_structure()    
      File "cylc/flow/scheduler.py", line 1639, in update_data_structure
        self.data_store_mgr.update_data_structure()    
      File "cylc/flow/data_store_mgr.py", line 1717, in update_data_structure
        self.window_resize_rewalk()    
      File "cylc/flow/data_store_mgr.py", line 1788, in window_resize_rewalk        deserialise_set(tproxy.flow_nums)
    AttributeError: 'bool' object has no attribute 'flow_nums'
CRITICAL - Workflow shutting down - 'bool' object has no attribute 'flow_nums'

There was no previous set_graph_window_extent command so it must have been 1 before.

MetRonnie commented 3 months ago

https://github.com/cylc/cylc-flow/blob/c029d6be35c8f6f4961e6d16150dcd267d1e1aec/cylc/flow/data_store_mgr.py#L1781-L1789

A PbTaskProxy was not found in the store despite the task ID being added to self.all_task_pool

https://github.com/cylc/cylc-flow/blob/c029d6be35c8f6f4961e6d16150dcd267d1e1aec/cylc/flow/data_store_mgr.py#L2651-L2662

Do we have a copy of this workflow?

oliver-sanders commented 3 months ago

Yes, but it's non-trivial, will PM you.

MetRonnie commented 3 months ago

I had a quick go at reproducing using a copy of the workflow in sim mode; no luck.

A PbTaskProxy was not found in the store despite the task ID being added to self.all_task_pool

I'm not sure how this happens, or what should be done about it

dwsutherland commented 2 months ago

I'm not entirely sure why it's happening, and if I can't reproduce it, it's hard to pinpoint... The self.all_task_pool is created by the task pool: https://github.com/cylc/cylc-flow/blob/caa0466ab5c8c8d0fd16214b656409d62b43d6f6/cylc/flow/task_pool.py#L235-L250 So can only happen here if the data_store_mgr.increment_graph_window doesn't create it .. (which can only happen if it's already in the store)

and removed by: https://github.com/cylc/cylc-flow/blob/caa0466ab5c8c8d0fd16214b656409d62b43d6f6/cylc/flow/task_pool.py#L815-L869 (if the try/except is triggered then it shouldn't be removed from both the store and self.all_task_pool)

And the window resize happens before any pruning.

One thing we can say:

So we can put a workaround it if needed .. but yeah doesn't properly "solve" the issue..

dwsutherland commented 2 months ago

It cannot happen due to reload .. because all the data-store attributes are reset (including all_task_pool):

        # Reset attributes/data-store on reload:
        if reloaded:
            self.__init__(self.schd, self.n_edge_distance)