Seravo / milliseconds

Python library to parse Nginx logs
GNU General Public License v3.0
13 stars 1 forks source link

Research option to replace Milliseconds with a Nginx module #5

Open ottok opened 4 years ago

ottok commented 4 years ago

Milliseconds is based on parsing Nginx access logs to print out statistics from them. The main downside in this approach is that the access log must be written before it can be parsed, and the time frame must be decided in advance.

An alternative approach would be to extend some existing Nginx stats module stub to do these same stats on the fly with Nginx internal counters, and server them on-demand from some Nginx stats module endpoint (or dump to file when asked to). This way the stats would be as real-time as we want and without the overhead of dumping lots of lines in a log and then reading it again to compute stats.

We could perhaps fork the module http://nginx.org/en/docs/http/ngx_http_stub_status_module.html and extend it, or research if some similar module for Nginx already exists (such as https://github.com/vozlt/nginx-module-vts or https://github.com/dedok/nginx-stat-module)

ottok commented 4 years ago

Stats could be shown by previous minute? And the previous minute value would update once in a minute and that value would be fetched and stored in monitoring (and alterting). Internally it naturally also needs to know about the ongoing minute, but the value is not shown externally until the minute is full and the value is comparable with the previous value.

heikkiorsila commented 4 years ago

A note from phone conversation that happened today: Do a time and risk estimation first.

heikkiorsila commented 4 years ago

Did some proof-of-concept testing:

All in all, nginx-stat-module seems promising but it needs some modifications for calculating statistics. It exports a simple textual timeseries format:

wordpress,location=nginx,parameter=bytes_sent,interval=10 value=336.947 1599480003
wordpress,location=nginx,parameter=body_bytes_sent,interval=10 value=114.947 1599480003
wordpress,location=nginx,parameter=request_length,interval=10 value=578.000 1599480003
wordpress,location=nginx,parameter=rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=keepalive_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_2xx_rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=response_3xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_4xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_5xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p1 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p5 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p50 value=0.010 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p90 value=0.187 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p95 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p99 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p100 value=0.181 1599480003

nginx-stat-module sends these statistics out with udp to a configured server. This server should be on the localhost for maximum reliability. UDP can be a problem when the server is overloaded (monitoring should be the last thing that fails) but perhaps unlikely. Some reliability might be won by remembering a short period of statistics (needs code change) and sending the stats over a local unix/tcp socket (needs a code change).

ottok commented 4 years ago

nginx-stat-module sends these statistics out with udp to a configured server

Such an architecture would create excess services for us to maintain. I was under the impression that Nginx has some simple stats module one can simply query using HTTP and dump to a file or something.

ottok commented 3 years ago

Status: