flashcatcloud / categraf

one-stop telemetry collector for nightingale
https://flashcat.cloud/docs/
MIT License
807 stars 245 forks source link

postgre 监控采集up指标异常 #1016

Closed liu01100115 closed 3 weeks ago

liu01100115 commented 1 month ago

Relevant config.toml

# Read metrics from one or many postgresql servers
# # collect interval
# interval = 15

[[instances]]
  ## specify address via a url matching:
  ##   postgres://[pqgotest[:password]]@localhost[/dbname]?sslmode=[disable|verify-ca|verify-full]
  ## or a simple string:
  ##   host=localhost user=pqgotest password=... sslmode=... dbname=app_production
  ##
  ## All connection parameters are optional.
  ##
  ## Without the dbname parameter, the driver will default to a database
  ## with the same name as the user. This dbname is just for instantiating a
  ## connection with the server and doesn't restrict the databases we are trying
  ## to grab metrics for.
  ##
  address = "host=10.2.137.230 port=5432 user=categraf password=categraf database=postgres sslmode=disable"

  ## A custom name for the database that will be used as the "server" tag in the
  ## measurement output. If not specified, a default one generated from
  ## the connection address is used.
  # outputaddress = "db01"

  ## connection configuration.
  ## maxlifetime - specify the maximum lifetime of a connection.
  ## default is forever (0s)
  # max_lifetime = "0s"

  ## A  list of databases to explicitly ignore.  If not specified, metrics for all
  ## databases are gathered.  Do NOT use with the 'databases' option.
  # ignored_databases = ["postgres", "template0", "template1"]

  ## A list of databases to pull metrics about. If not specified, metrics for all
  ## databases are gathered.  Do NOT use with the 'ignored_databases' option.
  # databases = ["app_production", "testing"]

  ## Whether to use prepared statements when connecting to the database.
  ## This should be set to false when connecting through a PgBouncer instance
  ## with pool_mode set to transaction.
  #prepared_statements = true
  # [[instances.metrics]]
  # mesurement = "sessions"
  # label_fields = [ "status", "type" ]
  # metric_fields = [ "value" ]
  # timeout = "3s"
  # request = '''
  # SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type
  # '''

Logs from categraf

2024/07/25 00:40:46 main.go:149: I! runner.binarydir: /usr/local/categraf
2024/07/25 00:40:46 main.go:150: I! runner.hostname: test-10-2-137-230
2024/07/25 00:40:46 main.go:151: I! runner.fd_limits: (soft=65535, hard=65535)
2024/07/25 00:40:46 main.go:152: I! runner.vm_limits: (soft=unlimited, hard=unlimited)
2024/07/25 00:40:46 provider_manager.go:60: I! use input provider: [local]
2024/07/25 00:40:46 prometheus_agent.go:19: I! prometheus scraping disabled!
2024/07/25 00:40:46 agent.go:38: I! agent starting
2024/07/25 00:40:46 metrics_agent.go:317: I! input: local.postgresql started
2024/07/25 00:40:46 agent.go:46: I! [*agent.MetricsAgent] started
2024/07/25 00:40:46 agent.go:46: I! [*agent.IbexAgent] started
2024/07/25 00:40:46 agent.go:49: I! agent started
2024/07/25 00:40:46 heartbeat.go:19: I! ibex agent start rolling request Server.Report.
2024/07/25 00:40:46 postgresql.go:181: E! failed to execute Query : failed to connect to `host=10.2.137.230 user=categraf database=postgres`: dial error (dial tcp 10.2.137.230:5432: connect: connection refused)
00:40:46 postgresql_up agent_hostname=10.2.137.230 region=beijing-qh server=host=10.2.137.230 port=5432 database=postgres  1
2024/07/25 00:40:47 cli.go:84: I! choose server: 10.20.18.5:20090, duration: 1ms
2024/07/25 00:41:01 postgresql.go:181: E! failed to execute Query : failed to connect to `host=10.2.137.230 user=categraf database=postgres`: dial error (dial tcp 10.2.137.230:5432: connect: connection refused)
00:41:01 postgresql_up agent_hostname=10.2.137.230 region=beijing-qh server=host=10.2.137.230 port=5432 database=postgres  1
^C2024/07/25 00:41:03 heartbeat.go:87: I! ibex agent received signal: interrupt
2024/07/25 00:41:03 main.go:131: I! received signal: interrupt
2024/07/25 00:41:03 agent.go:53: I! agent stopping
2024/07/25 00:41:03 agent.go:61: I! [*agent.MetricsAgent] stopped
2024/07/25 00:41:03 agent.go:61: I! [*agent.IbexAgent] stopped
2024/07/25 00:41:03 agent.go:64: I! agent stopped
2024/07/25 00:41:03 main.go:144: I! exited

System info

Linux test-10-2-137-230 3.10.0-1160.108.1.el7.x86_64 #1 SMP Thu Jan 25 16:17:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Docker

No response

Steps to reproduce

1、采集器在执行sql.Open的时候没有正确获取错误

Expected behavior

1、采集器在执行sql.Open的时候没有正确获取错误

Actual behavior

1、采集器在执行sql.Open的时候没有正确获取错误

Additional info

No response

kongfei605 commented 1 month ago

dial tcp 10.2.137.230:5432: connect: connection refused categraf所在机器到 10.2.137.230网络不通

nc -v 10.2.137.230 5432可以测试一下