StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.01k stars 1.81k forks source link

The FE&BE&CN startup script automatically detects that the startup is successful #48369

Open zhuxt2015 opened 4 months ago

zhuxt2015 commented 4 months ago

Enhancement

目前还需要用户在执行完进程启动shell脚本之后,手动去日志文件中检测指定日志内容,比如FE进程检查fe.log中是否有"2022-08-10 16:12:29,911 INFO (UNKNOWN x.x.x.x_9010_1660119137253(-1)|1) [FeServer.start():52] thrift server started with port 9020.",判断进程已经启动成功。这种做法对于用户很不方便。

At present, you also need to manually go to the log file to check the specified log content after execute process start shell script,the FE process fe.log checks whether there is '"2022-08-10 16:12:29,911 INFO (UNKNOWN x.x.x.x_9010_1660119137253(-1)|1) [FeServer.start():52] thrift server started with port 9020." to determine that the process has been started successfully. This practice is very inconvenient for users.

解决方案: 启动脚本自动检测日志中是否出现了指定日志内容,超时30秒,如果30秒内无法检测到指定日志,提示用户自行检查日志内容。 The startup script automatically detects whether the specified log content appears in the log. If the specified log cannot be detected within 30 seconds, remind user to manually check the log contents.

kevincai commented 4 months ago

It is not reliable to check loglines to determine whether the service is running or not, because 1) INFO log can be turned off completely 2) log directory can be changed and the log filename can be changed too 3) the log can even get into console instead of log file if provided with --logconsole option.

Should leverage other ways to determine whether the service is online or not, e.g. try to check <fe>:8030/api/health which is used in k8s service readiness probing.