cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
333 stars 94 forks source link

GH Actions MacOS performance issues #6276

Open MetRonnie opened 3 months ago

MetRonnie commented 3 months ago

From Element:

@MetRonnie: Trying to sort out MacOS functional test failures, one suspicious thing I'm seeing in the workflow logs is that all cylc messages seem to take almost exactly 10s since the last message (so all task state changes for example seem to take 10s). Don't know what to do with this information however! It is using ZMQ for comms.

(This 10s performance issue is what is causing some tests to fail consistently as the timings just do not allow stuff to run to completion before automatically timing out etc.)

@oliver-sanders: The default comms timeout is 5 seconds which may be related. Fixing flaky Mac OS tests is fairly low priority.

If we are not too concerned, I'm opening a PR to skip the 2 failing tests on MacOS on GH Actions as there is no point running them if they're going to fail 100% of the time

oliver-sanders commented 2 months ago

Tried running the two tests on my Mac, both passed relatively quickly (by Cylc test battery standards):

$ ctb -v tests/functional/flow-triggers/11-wait-merge.t
ok 1 - 11-wait-merge-validate
ok 2 - 11-wait-merge-run
ok 3 - 11-wait-merge-order-no-wait
ok 4 - 11-wait-merge-order-no-wait.stdout-cmp-ok
ok    26610 ms ( 0.00 usr  0.00 sys +  4.38 cusr  1.59 csys =  5.97 CPU)
All tests successful.
Files=1, Tests=4, 29 wallclock secs ( 0.03 usr  0.01 sys +  4.38 cusr  1.59 csys =  6.01 CPU)
Result: PASS

$ ctb -v tests/functional/modes/04-simulation-runtime.t
ok 1 - 04-simulation-runtime-validate
ok 2 - 04-simulation-runtime-start
ok 3 - second-task-broadcast-too-long
ok 4 - cancel-second-task-broadcast
ok 5 - first-task-speed-up-broadcast
ok 6 - 04-simulation-runtime-unpause
ok 7 - log-grep-fail
ok    12701 ms ( 0.00 usr  0.00 sys +  4.46 cusr  1.41 csys =  5.87 CPU)
All tests successful.
Files=1, Tests=7, 14 wallclock secs ( 0.02 usr  0.01 sys +  4.46 cusr  1.41 csys =  5.90 CPU)
Result: PASS

I expect this issue is probably more related to the GitHub actions environment than Mac OS per-se?