Closed rayn316 closed 2 weeks ago
不知道是怎么回事,在出现这个错误之后,报错的几台categraf一直起不来
然后发现categraf的目录/usr/local/categra
不见了,看起来目录被删除了,不知道是不是这个kill错误导致的
脚本中执行的什么? 删除/usr/local/categraf ? 任务内容呢?
脚本中没有删除/usr/local/categraf,可能是其它地方做的
有时候categraf会报错误,导致一直上传任务结果失败,然后夜莺任务显示一直处于running状态
Jun 13 12:00:06 categraf[1280617]: 2024/06/13 12:00:06 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
Jun 13 12:00:08 categraf[1280617]: 2024/06/13 12:00:08 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
Jun 13 12:00:09 categraf[1280617]: 2024/06/13 12:00:09 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
Jun 13 12:00:11 categraf[1280617]: 2024/06/13 12:00:11 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
Jun 13 12:00:13 categraf[1280617]: 2024/06/13 12:00:13 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
Jun 13 12:00:14 categraf[1280617]: 2024/06/13 12:00:14 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
只有把执行任务的输出文件手动重置,再重启categraf才会正常
有时候categraf会报错误,导致一直上传任务结果失败,然后夜莺任务显示一直处于running状态
Jun 13 12:00:06 categraf[1280617]: 2024/06/13 12:00:06 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1 Jun 13 12:00:08 categraf[1280617]: 2024/06/13 12:00:08 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1 Jun 13 12:00:09 categraf[1280617]: 2024/06/13 12:00:09 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1 Jun 13 12:00:11 categraf[1280617]: 2024/06/13 12:00:11 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1 Jun 13 12:00:13 categraf[1280617]: 2024/06/13 12:00:13 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1 Jun 13 12:00:14 categraf[1280617]: 2024/06/13 12:00:14 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\x82\xE9\x94\x99\xE8\xAF...' for column 'stdout' at row 1
这个报错看起来是数据库中表的charset不对,可以检查一下那一百张存放结果的表,如果是latin1就会有问题,可以改成utf8mb4
查询结果都是utf8mb4_0900_ai_ci
mysql> SELECT table_name, table_collation
-> FROM information_schema.TABLES
-> WHERE table_schema = 'n9e_v6';
+-----------------------+--------------------+
| TABLE_NAME | TABLE_COLLATION |
+-----------------------+--------------------+
| alert_aggr_view | utf8mb4_0900_ai_ci |
| alert_cur_event | utf8mb4_0900_ai_ci |
| alert_his_event | utf8mb4_0900_ai_ci |
| alert_mute | utf8mb4_0900_ai_ci |
| alert_rule | utf8mb4_0900_ai_ci |
| alert_subscribe | utf8mb4_0900_ai_ci |
| alerting_engines | utf8mb4_0900_ai_ci |
| board | utf8mb4_0900_ai_ci |
| board_busigroup | utf8mb4_0900_ai_ci |
| board_payload | utf8mb4_0900_ai_ci |
| builtin_cate | utf8mb4_0900_ai_ci |
| builtin_components | utf8mb4_0900_ai_ci |
| builtin_metrics | utf8mb4_0900_ai_ci |
| builtin_payloads | utf8mb4_0900_ai_ci |
| busi_group | utf8mb4_0900_ai_ci |
| busi_group_member | utf8mb4_0900_ai_ci |
| chart | utf8mb4_0900_ai_ci |
| chart_group | utf8mb4_0900_ai_ci |
| chart_share | utf8mb4_0900_ai_ci |
| configs | utf8mb4_0900_ai_ci |
还是会这样,有时候执行任务超时,上服务器看categraf,一直报这这种错误 Jun 18 10:49:20 categraf[1232721]: 2024/06/18 10:49:20 heartbeat.go:48: E! error from server: Error 1366: Incorrect string value: '\xB6\xE8\xBF\x9F: ...' for column 'stdout' at row 1
那没有别的思路了,我的认知里这个错误就是字符集的问题,问 gpt 也是类似的回复:
或许,也可能是你的脚本输出的内容不是 utf8 可以解析的
看输出文本和其它正常写入节点日志都是一样的,看不出来特殊字符
可以让categraf遇到这种无法写入的字符串,跳过或者无视吗 比如统一设置为替代符 [] 之类的替换
不然后面一遇到特殊字符就要找的字符集问题 categraf还要手动重启,这种也不行
这是服务端的逻辑,服务端负责写数据库,后面可以在ibex里做这个容错处理
秦晓辉 @.***
快猫星云 联合创始人 18612185520
------------------ 原始邮件 ------------------ 发件人: 赵尚 @.> 发送时间: 2024年6月18日 11:39 收件人: ccfos/nightingale @.> 抄送: ulricqin @.>, Comment @.> 主题: Re: [ccfos/nightingale] 任务界面执行kill会导致categraf报错退出 (Issue #1987)
看输出文本和其它正常写入节点日志都是一样的,看不出来特殊字符
可以让categraf遇到这种无法写入的字符串,跳过或者无视吗 比如统一设置为替代符 [] 之类的替换
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
将输出下载下来手动写入数据库后,发现提示超出大小,可能是日志大于 text 规定的大小,也许设置为longtext就不会写入失败了
mysql> SHOW COLUMNS FROM task_host_0;
+--------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------------+------+-----+---------+----------------+
| ii | bigint unsigned | NO | PRI | NULL | auto_increment |
| id | bigint unsigned | NO | MUL | NULL | |
| host | varchar(128) | NO | | NULL | |
| status | varchar(32) | NO | | NULL | |
| stdout | text | YES | | NULL | |
| stderr | text | YES | | NULL | |
+--------+-----------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
https://github.com/ccfos/nightingale/pull/2027 提交了更改,但是没找到默认n9e.sql文件在哪里,你们有时间可以改下
Your config.toml
Relevant logs
System info
categraf v0.3.69
Steps to reproduce
...
Expected behavior
服务正常执行,执行kill后categraf正常运行
Actual behavior
执行kill后categraf崩溃
Additional info
No response