ccfos / nightingale

An all-in-one observability solution which aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful web UI.
https://flashcat.cloud/docs/
Apache License 2.0
9.42k stars 1.38k forks source link

任务相关 #1986

Closed rayn316 closed 2 months ago

rayn316 commented 2 months ago

What would you like to be added:

  1. 查看运行任务中的日志的功能
  2. 任务超时的情况下,运行的脚本依然会运行,没有结束
  3. 执行任务前填入不存在的主机名应该会报错才对

Why is this needed:

  1. 运行的任务有时候会卡住,但是任务没有完成时是看不到日志,不方便排查情况,后续是否会变成执行中也能查看日志?
  2. 这很奇怪,按照逻辑上来说,超时的任务会被强制kill,但实际上运行一直在运行,必须上服务器执行kill
  3. 执行任务前填入不存在的主机名会生成任务,但是任务会等待超时才错误? 执行kill也是,任务总状态好像不会显示结束
rayn316 commented 2 months ago

有些时候kill任务会让categraf崩溃退出

Jun 11 18:07:55 categraf[1960524]: 2024/06/11 18:07:55 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:07:56 categraf[1960524]: 2024/06/11 18:07:56 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:07:57 categraf[1960524]: 2024/06/11 18:07:57 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:07:58 categraf[1960524]: 2024/06/11 18:07:58 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:07:59 categraf[1960524]: 2024/06/11 18:07:59 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:08:00 categraf[1960524]: 2024/06/11 18:08:00 heartbeat.go:64: I! assigned tasks: [14095]
Jun 11 18:08:00 categraf[1960524]: 2024/06/11 18:08:00 task.go:343: D! begin kill process of task[14095]
Jun 11 18:08:00 categraf[1960524]: panic: runtime error: invalid memory address or nil pointer dereference
Jun 11 18:08:00 categraf[1960524]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xa0 pc=0xbedb3d]
Jun 11 18:08:00 categraf[1960524]: goroutine 6733953 [running]:
Jun 11 18:08:00 categraf[1960524]: flashcat.cloud/categraf/ibex.CmdKill(...)
Jun 11 18:08:00 categraf[1960524]:         /home/runner/work/categraf/categraf/ibex/cmd_nix.go:16
Jun 11 18:08:00 categraf[1960524]: flashcat.cloud/categraf/ibex.killProcess(0xc00250c0d0)
Jun 11 18:08:00 categraf[1960524]:         /home/runner/work/categraf/categraf/ibex/task.go:345 +0x11d
Jun 11 18:08:00 categraf[1960524]: created by flashcat.cloud/categraf/ibex.(*Task).kill in goroutine 401
Jun 11 18:08:00 categraf[1960524]:         /home/runner/work/categraf/categraf/ibex/task.go:299 +0x4f
Jun 11 18:08:00 systemd[1]: categraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
UlricQin commented 2 months ago

感谢问题反馈,不过请把问题拆成不同的 issue 哈,后面提 PR 的时候也会关联对应的 issue,混杂在一起较难管理

rayn316 commented 2 months ago

我将问题分为多个issues了