apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.72k stars 6.5k forks source link

[Feature] Support Nginx monitoring. #11534

Closed weixiang1862 closed 10 months ago

weixiang1862 commented 10 months ago

Search before asking

Description

Nginx is a popular HTTP and reverse proxy server. I was already doing this work last weekend, please assign this task to me, more details and progress I will sync here.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

wu-sheng commented 10 months ago

Are you using both metrics and log analysis?

weixiang1862 commented 10 months ago

Just metrics analysis, expose by nginx-lua-prometheus, but it need some extra programming in nginx.conf:

http {
    resolver local=on ipv6=off;

    lua_shared_dict prometheus_metrics 10M;
    # lua_package_path "/path/to/nginx-lua-prometheus/?.lua;;";

    init_worker_by_lua_block {
      prometheus = require("prometheus").init("prometheus_metrics")

      metric_bytes = prometheus:counter(
        "nginx_http_size_bytes", "Total size of HTTP", {"type", "route"})
      metric_requests = prometheus:counter(
        "nginx_http_requests_total", "Number of HTTP requests", {"status", "route"})
      metric_latency = prometheus:histogram(
        "nginx_http_latency", "HTTP request latency", {"route"})
      metric_connections = prometheus:gauge(
        "nginx_http_connections", "Number of HTTP connections", {"state"})
    }

    server {
        listen 8080;

        location /test {
          default_type application/json;
          return 200  '{"code": 200, "message": "success"}';

          log_by_lua_block {
            metric_bytes:inc(tonumber(ngx.var.bytes_received), {"request", "/test/**"})
            metric_bytes:inc(tonumber(ngx.var.bytes_send), {"response", "/test/**"})
            metric_requests:inc(1, {ngx.var.status, "/test/**"})
            metric_latency:observe(tonumber(ngx.var.request_time), {"/test/**"})
          }
        }

        location /test_404 {
          default_type application/json;
          return 404  '{"code": 404, "message": "not found"}';

          log_by_lua_block {
            metric_bytes:inc(tonumber(ngx.var.bytes_received), {"request", "/test_404/**"})
            metric_bytes:inc(tonumber(ngx.var.bytes_send), {"response", "/test_404/**"})
            metric_requests:inc(1, {ngx.var.status, "/test_404/**"})
            metric_latency:observe(tonumber(ngx.var.request_time), {"/test_404/**"})
          }
        }

        location /test_500 {
          default_type application/json;
          return 500  '{"code": 500, "message": "internal error"}';

          log_by_lua_block {
            metric_bytes:inc(tonumber(ngx.var.bytes_received), {"request", "/test_500/**"})
            metric_bytes:inc(tonumber(ngx.var.bytes_send), {"response", "/test_500/**"})
            metric_requests:inc(1, {ngx.var.status, "/test_500/**"})
            metric_latency:observe(tonumber(ngx.var.request_time), {"/test_500/**"})
          }
        }
    }

    server {
      listen 9145;
      location /metrics {
        content_by_lua_block {
          metric_connections:set(ngx.var.connections_reading, {"reading"})
          metric_connections:set(ngx.var.connections_waiting, {"waiting"})
          metric_connections:set(ngx.var.connections_writing, {"writing"})
          prometheus:collect()
        }
      }
    }
}

I compared it with nginx-prometheus-exporter and nginx-vts-exporter, this lua library can provide more metrics and user can freely define their endpoint grouping rule.

weixiang1862 commented 10 months ago

nginx-prometheus-exporter works well with nginx plus, but in nginx oss it expose poor metrics.

nginx-vts-exporter requires rebuild nginx with vts-module, alough nginx-lua-prometheus requires extra lua module too, I think it is more widely used by user.

wu-sheng commented 10 months ago

Thanks for the explanation. I just wanted to ask about nginx-prometheus-exporter.

__

So, you are proposing to use nginx-vts-exporter? About logs, I was trying to ask, whether we should check whether there are some typically error logs to indicate something, we could put them on the LAL and dashboard.

weixiang1862 commented 10 months ago

I propose to use nginx-lua-prometheus.

About error logs analysis, I need more research, once I get some clues I will share with you here.

wu-sheng commented 10 months ago

I propose to use nginx-lua-prometheus.

About this, this requires adding more metrics through LUA scripts for every route rule, right?

weixiang1862 commented 10 months ago

I propose to use nginx-lua-prometheus.

About this, this requires adding more metrics through LUA scripts for every route rule, right?

It can be added in both http or location section, it depended on how precision user want monitoring endpoint metrics, may be just /** in http section, or more precision user/**, user/1, user/2 in location section.

wu-sheng commented 10 months ago

I propose to use nginx-lua-prometheus.

About this, this requires adding more metrics through LUA scripts for every route rule, right?

It can be added in both http or location section, it depended on how precision user want monitoring endpoint metrics, may be just /** in http section, or more precision user/**, user/1, user/2 in location section.

Yes, that was I mean. OK, then we need a good documentation to explain.

weixiang1862 commented 10 months ago

I propose to use nginx-lua-prometheus.

About this, this requires adding more metrics through LUA scripts for every route rule, right?

It can be added in both http or location section, it depended on how precision user want monitoring endpoint metrics, may be just /** in http section, or more precision user/**, user/1, user/2 in location section.

Yes, that was I mean. OK, then we need a good documentation to explain.

Ok, thanks.

weixiang1862 commented 10 months ago

Error Log Count metric is extracted from nginx error.log, this may be meaningful to user.

image

wu-sheng commented 10 months ago

Yes, monitoring output error level logs totally make sense.