Open oliver-sanders opened 3 years ago
I guess we need a timeout for cylc commands executed by task jobs, to avoid hanging jobs unnecessarily. For user-run commands it doesn't matter so much.
The issue is, it's not actually a "comms timeout" - should we rename it, to just "timeout"?
Better check that there is a default timeout esp. for cylc message
(doc says: for messaging see global config ...)
But ... it does apply only to commands that contact a scheduler.
Decision: get rid of the option, replace it with a network timeout configured in global.cylc. PT30S default? Current default for messaging - PT30S.
~Current default for messaging - PT30S.~
Current default for messaging - PT5S
(note this is a round-trip timeout, i.e. the time between sending the request and receiving the response)
Decision:
--comms-timeout
option (no use case0.PT30S
.Note that the Cylc 7 [task messaging]connection timeout
setting was removed in #3402.
Slight problem with the above decision...
The new comms timeout
global configuration will be loaded from the global config on the platform from which the request was made. This means that in order to set this configuration properly it must be set on all platforms.
The alternative being to add a field to the contact file to allow the value to be passed through to remote platforms.
Options:
comms timeoout
to be set on all platforms.comms timeout
into the contact file.BTW: We are currently doing (3) with a hardcoded default of PT5S
. Haven't seen any issues yet.
IMO a hard coded timeout is fine, as well as simpler, so long as it is long enough to handle reasonable latencies.
Discussed in VC 18/11/21. Decision remove this from rc1 and track use cases before proceeding with this issue.
(Also from the VC: if we do come back to this, having to configure it on remote platforms is not unreasonable given the functionatlity it affects).
The
--comms-timeout
option from Cylc7 is still present in Cylc8, however, it doesn't do quite what it says on the tin.It is now a global timeout, it is implemented in the Python layer, however:
Current implementation:
https://github.com/cylc/cylc-flow/blob/07c7f5c9c785f097066ffdf8d4d5630b5914d734/cylc/flow/network/client.py#L171-L188
Questions:
1) Reconsider the case for timeouts and determine how they should look in the future