Uninett / nav

Network Administration Visualized
GNU General Public License v3.0
192 stars 39 forks source link

Integrate DHCP lease statistics #2369

Open lunkwill42 opened 2 years ago

lunkwill42 commented 2 years ago

The CNaaS team wants to be able to integrate DHCP statistics into NAV.

Overview

A DHCP server can be made to summarize stats on its networks, address ranges, the current number of leases vs. the maximum number for each range. A third party script could gather these metrics and send them to NAV's Graphite server. However, NAV has no way to interpret or graph these metrics, since they didn't come from NAV. At best, you can add threshold rules on these "foreign" metrics.

We have identified three goals for a minimally viable feature:

Examples

For ISC DHCP, a command line utility exists to summarize information about each configured DHCP pool: dhcpd-pools. The command can output both human-readable tables to stdout, or as JSON data, which is excellent for a script to parse and push to Graphite.

Using the output of a dhcpd-pools command as an example (IP-ranges have been anonymized):

Ranges:
shared net name     first ip           last ip     max   cur    percent  touch   t+c  t+c perc
vlan511             w.x.y.z       - w.x.y.z        239    39     16.318      0    39    16.318
vlan1120            w.x.y.z       - w.x.y.z        239    76     31.799    163   239   100.000
vlan1121            w.x.y.z       - w.x.y.z        239     0      0.000      8     8     3.347
vlan1100            w.x.y.z       - w.x.y.z        239    77     32.218    144   221    92.469
vlan1100            w.x.y.z       - w.x.y.z        254   117     46.063    137   254   100.000
vlan1100            w.x.y.z       - w.x.y.z        254    76     29.921    177   253    99.606
vlan1100            w.x.y.z       - w.x.y.z        254   119     46.850    133   252    99.213
vlan1160            w.x.y.z       - w.x.y.z         14     0      0.000      0     0     0.000
vlan1170            w.x.y.z       - w.x.y.z         27    26     96.296      0    26    96.296

Shared networks:
name                   max   cur     percent  touch    t+c  t+c perc
vlan511                239    39     16.318       0     39    16.318
vlan1100              1001   389     38.861     591    980    97.902
vlan1120               239    76     31.799     163    239   100.000
vlan1121               239     0      0.000       8      8     3.347
vlan1160                14     0      0.000       0      0     0.000
vlan1170                27    26     96.296       0     26    96.296

Sum of all ranges:
name                   max   cur     percent  touch    t+c  t+c perc
All networks          1759   530     30.131     762   1292    73.451

What we want to submit to Graphite are the max, cur and touch numbers for each network listed under Shared networks. The networks/pools are named after the VLAN it belongs to (which is a matter of policy, not requirement).

For this example, we might want to submit metrics like:

The actual IP ranges are of less importance in an MVP: As long as NAV can parse a VLAN name from level below nav.dhcp, it can create DHCP utilization graphs in the VLAN details page: When viewing the VLAN details for VLAN 1100, NAV could find that there are DHCP metrics that match this VLAN in nav.dhcp, and draw a graph from that.

The network names can also be something like vlan1100_some_description, or some_description_vlan1100, but this should still match as VLAN 1100 in NAV.

An extra level in the metric path for location may also be needed. This could in reality be any prefix configured into the integration script, something like:

lunkwill42 commented 2 years ago

An extra level in the metric path for location may also be needed. This could in reality be any prefix configured into the integration script, something like:

* `nav.dhcp.trondheim.vlan511`

* `nav.dhcp.oslo.vlan511`

NAV separates broadcast domains that share a common VLAN tag by using a netident attribute (which it parses from the router port description). The most feasible way for NAV to separate DHCP pools into the correct broadcast domains is if the netident NAV knows is part of the of the DHCP pool name and subsequently encoded into the Graphite metric path.

E.g. nav.dhcp.vlan511.somenetident to group multiple pools by VLAN number, or nav.dhcp.somenetident.vlan511 to group by netident.

Another question is how to handle the situation where there is no netident in a DHCP pool name, just the vlan tag: What metric path should be used then?

lunkwill42 commented 2 years ago

IRL discussion landed us on nav.dhcp.vlanXXX.netidentYYY is the preferred prefix for DHCP pool stats. We should probably also support the simple case where no VLAN tags are reused, so dhcp pool names of just vlanXXX should be logged directly under nav.dhcp.vlanXXX.

Actions:

lunkwill42 commented 1 year ago
* [x]  Verify that comma `,` is a valid part of a Graphite path name (since many netidents as parsed by NAV from the NTNU convention will contain one or two commas) @lunkwill42

Commas do not seem to be valid, or at least they will interfere with the interpretation of graphite render commands if used in metric names. We will need to escape these commas somehow (standard for most special chars so far has been to either strip them or replace them with underscores)