Closed fynxer closed 6 years ago
For AMD there is still no watchdog implemented, I am working on one but its linux only.
See here #97
On linux the ideal and fairly easy to add solution would be to implement systemd's sd_notify
interface – see systemd for Administrators, Part XV
First of all, to make software watchdog-supervisable it needs to be patched to send out "I am alive" signals in regular intervals in its event loop. Patching this is relatively easy. First, a daemon needs to read the WATCHDOG_USEC= environment variable. If it is set, it will contain the watchdog interval in usec formatted as ASCII text string, as it is configured for the service. The daemon should then issue sd_notify("WATCHDOG=1") calls every half of that interval. A daemon patched this way should transparently support watchdog functionality by checking whether the environment variable is set and honouring the value it is set to.
On Windows you can easily perl and ps a solution... the scripts are quite small and will restart ethminer once you recognize a (CUDA) error via perl log parser or a dead process via ps command.
Maybe you can have a look here. Auto restart ethminer if no job for 5 min time. Auto restart system if "CUDA ERROR" detected. https://bitcointalk.org/index.php?topic=2195527.0
After #757 (added --exit parameter to exit whenever an error occurred) you can use a watchdog.
Here is my ETHminerWatchDogDmW Windows7/8/10 [32/64] & Linux (Any Dist/Any Ver/Any Arch) (#735).
Check and feedback please. Thank you!
If that can be of interest to anyone, I created one for linux using AMD video cards. Probably not too hard to adjust for CUDA crashes if one wants to contribute. https://github.com/th0ma7/th0ma7
When something happens to ethminer we need a WATCHDOG that can alert and/or restart miner.
Thx to all you guys working on and developing Ethminer, i really appreciate it.