Open irudkin opened 7 years ago
This issue is related issue #8. New text:
For a system to improve during development or maintain its safety integrity during commission it is advisable for critical systems (systems that risk injury or death) that they also operate along side a separate system which monitors the status of that critical system. Normally known by the name of a Watchdog (not a Watchdog timer - it resets the system) it is a system which can perform cross checking with other systems if necessary and report the status of itself and the system it is monitoring using a separate API. Such systems need to accommodate this kind of reporting in the safety criticlal API. Other considerations are mentioned below.
A critical system ideally should not be able to enter non-recoverable state. It is likely that in such a state it would not be able to return a status condition for the user to act on. This guideline suggests the implementers should document:
The client would be able to develop tests for those those and where applicable verify their recover processors do work.
New text to be entered:
A system should facilitate the communication of all its operational states to its client or monitoring system efficiently and effectively to aid its development, testing regime and deployment. A system could communicate state using: The interface’s or API functions’ parameters Drive signals on connected hardware *Write diagnostics to shared memory
The critical safety level of the system will likely dictate the partitioning design of the system with other components which in turn will influence how a SC system communicates its state.
It is important that all states and error conditions are described discretely and not grouped into a general state or error condition. The system should be expected to provide information about itself at anytime in a deterministic and timely manner in what ever operational mode it is currently in. For example: normal running behaviour a safety mode, reduced functionality ‘limp home’ mode *indeterminate unrecoverable state
For highly critical safety systems a separate Watch Dog component is likely deployed to oversee those critical systems and monitor their behaviour for abnormal events which then signals higher level systems to take appropriate corrective action.
A safety critical system’s API should be documented including: normal behavioural states all error states all safety mode limiting behavioural states, how they are entered and whether they are recoverable *all non deterministic non recoverable states and how they are entered.
Ideally a safety critical system should not have any non deterministic non recoverable operational states.
New text entered. Need review.
Moved from issue #8 (Bugzilla 16059) as discussed meeting 30/01/2017. Deemed better as a new separate issue. To be re-worded and remove comment 3 list item 3. Re-edit to mention watch dog or cross checking verification systems to allow for reporting of status. This issue is related issue #8.
Comment 3 illya@codeplay.com 2016-11-15 03:49:39 PST
Discuss: Ideally there should not be any non-recoverable error conditions. As a guideline the implementers should consider however if there are non-recoverable states then the client (developers using the SC API) should be made aware off the following:
This would allow the client to develop tests for those states and where applicable verify their recover processes do work.
Comment 4 illya@codeplay.com 2016-11-22 00:36:27 PST
From discussion it was not clear the intention of comment 3. The implementer should provide as much documentation as possible on the reasons a for undefined state behaviour represented by the returned error code. This is very much implementation specific and so nothing more can be added in the guidelines apart for a strong recommendation. If the user has done his due diligence anyway they should be asking such questions anyway especially is the implementation is a black box - what are the side affects?