cylc / cylc-doc

Documentation (User Guide, Cheat Sheets, etc.) for the Cylc Workflow Engine.
https://cylc.github.io/cylc-doc/
GNU General Public License v3.0
8 stars 19 forks source link

troubleshooting: diagnosing incorrect task status #697

Closed hjoliver closed 2 months ago

hjoliver commented 3 months ago

Add to the new troubleshooting section once #638 is merged.


The Cylc UIs show the scheduler's current knowledge of task and job state. For active tasks, that involves interaction with the external world:

(Note the above assumes TCP job status messaging; otherwise the scheduler periodically polls for job status).

Tasks may get "stuck" in an incorrect state if anything blocks this external job status information. For instance, you may see a task that stays in the "submitted" state even though it actually ran and completed.

Polling the task - by which the scheduler queries the job runner and checks the job.status file - will return the correct result, but you may still need to determine what went wrong.

Incorrect task status implies one of two things:

You can determine what happened by examining the job logs:

oliver-sanders commented 3 months ago

Closed by https://github.com/cylc/cylc-doc/pull/638?

If not, push a commit onto upstream/troubleshotting.

hjoliver commented 3 months ago

I didn't think it was covered very well, but maybe I didn't look closely enough. I'll re-check and tweak it if necessary...

oliver-sanders commented 2 months ago

Here's the troubleshooting entry for job status not updating:

https://github.com/cylc/cylc-doc/pull/638/files#diff-3109576eee7d4e82c35cf79b3678f427036bd7be93134e9cea4cc866a63f8919R110-R164

hjoliver commented 2 months ago

OK cool, that's good enough. I'll close this.