flashcatcloud / categraf

one-stop telemetry collector for nightingale
https://flashcat.cloud/docs/
MIT License
854 stars 259 forks source link

添加mongo采集后一直报错 #1074

Open molixiaoge opened 1 month ago

molixiaoge commented 1 month ago

Relevant config.toml

[[instances]]
log_level = "info"
labels = { instance="mongo-21:3000" }
mongodb_uri = "mongodb://192.168.110.21:3000"
username = "categraf"
password = "categraf"
collect_all = true

Logs from categraf

dy@105:~$ systemctl  status categraf
● categraf.service - Opensource telemetry collector
     Loaded: loaded (/etc/systemd/system/categraf.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-10-15 19:47:49 CST; 8min ago
   Main PID: 698 (categraf)
      Tasks: 11 (limit: 4557)
     Memory: 129.7M
        CPU: 8.433s
     CGroup: /system.slice/categraf.service
             └─698 /opt/categraf-v0.3.80-linux-amd64/categraf -configs /opt/categraf-v0.3.80-linux-amd64/conf

Oct 15 19:55:50 105 categraf[698]: time="2024-10-15T19:55:50+08:00" level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggr>
Oct 15 19:56:05 105 categraf[698]: time="2024-10-15T19:56:05+08:00" level=error msg="cannot get $indexStats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { agg>
Oct 15 19:56:05 105 categraf[698]: time="2024-10-15T19:56:05+08:00" level=warning msg="cannot create metrics for oplog: mongo: no documents in result"
Oct 15 19:56:05 105 categraf[698]: time="2024-10-15T19:56:05+08:00" level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggr>
Oct 15 19:56:20 105 categraf[698]: time="2024-10-15T19:56:20+08:00" level=error msg="cannot get $indexStats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { agg>
Oct 15 19:56:20 105 categraf[698]: time="2024-10-15T19:56:20+08:00" level=warning msg="cannot create metrics for oplog: mongo: no documents in result"
Oct 15 19:56:20 105 categraf[698]: time="2024-10-15T19:56:20+08:00" level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggr>
Oct 15 19:56:35 105 categraf[698]: time="2024-10-15T19:56:35+08:00" level=error msg="cannot get $indexStats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { agg>
Oct 15 19:56:35 105 categraf[698]: time="2024-10-15T19:56:35+08:00" level=warning msg="cannot create metrics for oplog: mongo: no documents in result"
Oct 15 19:56:35 105 categraf[698]: time="2024-10-15T19:56:35+08:00" level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggr>

System info

categraf v0.3.80,docker 26.1.2,ubuntu 22.04

Docker

Client: Docker Engine - Community Version: 26.1.2 API version: 1.45 Go version: go1.21.10 Git commit: 211e74b Built: Wed May 8 13:59:59 2024 OS/Arch: linux/amd64 Context: default

Steps to reproduce

1.categraf添加mongo连接信息 2.systemctl restart categraf 重启 3.systemctl status categraf 查看日志 4.发现一直报错,就算换成mongo的root用户还是报错

Expected behavior

拥有root权限的用户,采集应该不会报错。

Actual behavior

一直提示报错

level=warning msg="cannot create metrics for oplog: mongo: no documents in result

level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command

Additional info

No response

kongfei605 commented 1 month ago

https://flashcat.cloud/docs/content/flashcat-monitor/categraf/plugin/mongodb/ 看看授权部分

molixiaoge commented 1 month ago

执行 db.getUser("categraf");

返回

{
    "_id" : "admin.categraf",
    "userId" : UUID("f1c459ac-d03f-45d9-a2f3-51ab938c81ef"),
    "user" : "categraf",
    "db" : "admin",
    "roles" : [
        {
            "role" : "clusterMonitor",
            "db" : "admin"
        },
        {
            "role" : "read",
            "db" : "local"
        }
    ],
    "mechanisms" : [ "SCRAM-SHA-1", "SCRAM-SHA-256" ]
}

看了一下发的连接,说明servers = ["mongodb://user:pass@mongodb://127.0.0.1:27017/?authSource=admin"]是最新配置

原来的配置是

mongodb_uri = "mongodb://192.168.110.21:3000" 
username = "categraf"
password = "categraf"

换成 servers = ["mongodb://categraf:categraf@mongodb://mongodb://192.168.110.21:3000/?authSource=admin"] 就没有错误提示了

是最新版 不能用 mongodb_uri 和username password么?

molixiaoge commented 1 month ago

说错了,换成上面的配置就不采集了

kongfei605 commented 1 month ago

执行 db.getUser("categraf");

返回

{
  "_id" : "admin.categraf",
  "userId" : UUID("f1c459ac-d03f-45d9-a2f3-51ab938c81ef"),
  "user" : "categraf",
  "db" : "admin",
  "roles" : [
      {
          "role" : "clusterMonitor",
          "db" : "admin"
      },
      {
          "role" : "read",
          "db" : "local"
      }
  ],
  "mechanisms" : [ "SCRAM-SHA-1", "SCRAM-SHA-256" ]
}

看了一下发的连接,说明servers = ["mongodb://user:pass@mongodb://127.0.0.1:27017/?authSource=admin"]是最新配置

原来的配置是

mongodb_uri = "mongodb://192.168.110.21:3000" 
username = "categraf"
password = "categraf"

换成 servers = ["mongodb://categraf:categraf@mongodb://mongodb://192.168.110.21:3000/?authSource=admin"] 就没有错误提示了

是最新版 不能用 mongodb_uri 和username password么?

能用。 username 和 password单独配置 也是去拼接server字符串

kongfei605 commented 1 month ago

换成 servers = ["mongodb://categraf:categraf@mongodb://mongodb://192.168.110.21:3000/?authSource=admin"]

=> servers = ["mongodb://categraf:categraf@mongodb://192.168.110.21:3000/?authSource=admin"]

mongdb:// 重复了,写了两次。

molixiaoge commented 1 month ago

更改配置成 mongodb://categraf:categraf@192.168.110.21:3000/?authSource=admin 不会采集任何数据,所以使用的还是 mongodb_uri 方式 查看详细错误如下3个错误

Oct 16 16:51:21 105 categraf[61028]: time="2024-10-16T16:51:21+08:00" level=error msg="cannot get $collstats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggregate: \"system.profile\", pipeline: [ { $collStats: { latencyStats: { histograms: false }, storageStats: { scale: 1 } } }, { $project: { storageStats.wiredTiger: 0, storageStats.indexDetails: 0 } } ], cursor: {}, lsid: { id: UUID(\"3993669a-aee8-4cf7-a631-78bb4effedd3\") }, $db: \"rider2\" }"
Oct 16 16:51:36 105 categraf[61028]: time="2024-10-16T16:51:36+08:00" level=error msg="cannot get $indexStats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggregate: \"system.profile\", pipeline: [ { $indexStats: {} } ], cursor: {}, lsid: { id: UUID(\"4f4b63e2-54ea-4ed2-b9f4-284fab75153d\") }, $db: \"rider2\" }"
Oct 16 16:51:36 105 categraf[61028]: time="2024-10-16T16:51:36+08:00" level=warning msg="cannot create metrics for oplog: mongo: no documents in result"

添加超级权限后 db.grantRolesToUser("categraf", [ { role: "readWriteAnyDatabase", db: "admin" } ])

错误变成两个

Oct 16 17:44:59 105 categraf[698]: time="2024-10-16T17:44:59+08:00" level=warning msg="cannot create metrics for oplog: mongo: no documents in result"
Oct 16 17:45:14 105 categraf[698]: time="2024-10-16T17:45:14+08:00" level=error msg="cannot get $indexStats cursor for collection rider2.system.profile: (Unauthorized) not authorized on rider2 to execute command { aggregate: \"system.profile\", pipeline: [ { $indexStats: {} } ], cursor: {}, lsid: { id: UUID(\"33b7dbcc-9963-4374-af4d-cb382646b334\") }, $db: \"rider2\" }"

查阅资料,发现这块 https://github.com/percona/mongodb_exporter/issues/784 mongodb_exporter 似乎也有兼容问题,是代码需要同步么?

kongfei605 commented 1 month ago

mongodb用的什么版本?

molixiaoge commented 1 month ago

mongo:5.0.0

kongfei605 commented 1 month ago

我用mongdb5.0.26+categraf v0.3.80 测试,配置

mongodb_uri = "mongodb://categraf:categraf@127.0.0.1:27017/?authSource=admin"

mongodb_uri = "mongodb://127.0.0.1:27017"
username = "categraf"
password = "categraf"

都没问题呢

kongfei605 commented 1 month ago
image

再贴张图

molixiaoge commented 1 month ago

启动全新空的mongo:5.0.0 图片

导入出错mongo的一些表 图片

问题比较奇怪了,不是这个数据库有问题吧

kongfei605 commented 1 month ago

这个warning不是说 没有document么? 有数据的DB中日志报什么错呢?