buckyos / CYFS

CYFS is the next-generation technology to build real Web3 by upgrading the basic protocol of Web (TCP/IP+DNS+HTTP),is short for CYberFileSystem. https://www.cyfs.com/, cyfs://cyfs/index_en.html.
https://www.cyfs.com/
BSD 2-Clause "Simplified" License
1.99k stars 276 forks source link

The `ping` with `sn` is stopped? #250

Open lurenpluto opened 1 year ago

lurenpluto commented 1 year ago

Discussed in https://github.com/buckyos/CYFS/discussions/249

Originally posted by **streetycat** May 4, 2023 I found that my `OOD` has been offline. After diagnosis, I found that there is no `Ping` package to `sn`, I don't known why the `ping` is abort. The gateway has running 6 days, the following log is the earliest that I can find: [gateway_bdt_1727_r00023.log](https://github.com/buckyos/CYFS/files/11394774/gateway_bdt_1727_r00023.log) The following log is the latest: [gateway_bdt_1727_rCURRENT.log](https://github.com/buckyos/CYFS/files/11394797/gateway_bdt_1727_rCURRENT.log)
lurenpluto commented 1 year ago

There are several suggestions on how to post logs

  1. Since the log itself is relatively large, we suggest using zip compression to upload it, which can also save upload time
  2. If it is the log of gateway process, because the log is divided into the main log and bdt log, so even when the bdt module has problems, it should be accompanied by a main log, at least it can see some version-related information, easy to diagnose

Thanks for providing further log to help diagnose @streetycat

lurenpluto commented 1 year ago

Considering the complexity of the SN ping mechanism and network conditions, consider add SN ping alive detection mechanism, similar to the current process stuck detection and task deadlock detection, if the SN ping is not updated for a period of time, then the ping is considered stuck, you can try to restart the gateway to avoid the entire gateway process in a "fake dead" state