czerwonk / junos_exporter

Exporter for devices running JunOS to use with https://prometheus.io/
MIT License
198 stars 81 forks source link

Add "show chassis fpc" metrics #126

Closed AKYD closed 3 years ago

AKYD commented 3 years ago

Hello,

This patch improves the fpc module in several ways:

1. Add show chassis fpc metrics + update tests

show chassis fpc detail and show chassis fpc show different information (e.g. fpc shows heap utilization percent)

Slot 0 information:
  State                               Online    
  Total CPU DRAM                 2048 MB
  Total SRAM                        0 MB
  Total SDRAM                       0 MB
  Temperature                      39 degrees C / 102 degrees F
  Start time                          2020-08-21 13:07:05 UTC
  Uptime                              215 days, 19 hours, 25 minutes, 38 seconds
                     Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
  0  Online             0      3          0        2      2      2    2048       43         59

Since some of the fields are common I've added the new possible fields to the FPC struct.

I've moved the fpc detail handling in the CollectFPCDetail function and stole the CollectFPC name for the non-detail command in order to have things aligned

Also the check from here doesn't work since show chassis fpc detail only returns FPCs that are online. I've moved the up state check to the non-detail function.

Improved tests: added more sample outputs + tested the routing engine field to make sure we're correctly parsing everything

2. Add multi routing engine support + add test

I've tried to use the same approach from the storage module - which works OK with minor adjustments :+1:

There is a bug in the way it works in storage and the test misses it since there's no validation for the re-name field (see below). I've had to change the unmarshal struct a bit to make it work, but it could probably use some refining.

The two cases are tested in the playground

Using the sample output from storage we can see that the filesystem from 2 routing engines are combined into 1 routing engine and this leads to metric collision

* collected metric "junos_storage_used_percent" { label:<name:"device" value:"host_app_disk" > label:<name:"mountpoint" value:"/.mount/var/install_disk" > label:<name:"re_name" value:"" > label:<name:"target" value:"127.0.0.1:20222" > gauge:<value:0 > } was collected before with the same name and label values source="log.go:172"

Unmarshal+masrhal using the original struct from storage module:

<MultiRoutingEngineResults>
  <multi-routing-engine-results>
    <re-name></re-name>
    <multi-routing-engine-item>
      <system-storage-information>
        <filesystem>
          <filesystem-name>/dev/gpt/junos</filesystem-name>
          <total-blocks>2796512</total-blocks>
          <used-blocks>1667792</used-blocks>
          <available-blocks>905000</available-blocks>
          <used-percent> 65</used-percent>
          <mounted-on>/.mount</mounted-on>
        </filesystem>
        <filesystem>
          <filesystem-name>/dev/gpt/junos</filesystem-name>
          <total-blocks>2796512</total-blocks>
          <used-blocks>1667792</used-blocks>
          <available-blocks>905000</available-blocks>
          <used-percent> 65</used-percent>
          <mounted-on>/.mount</mounted-on>
        </filesystem>
      </system-storage-information>
    </multi-routing-engine-item>
  </multi-routing-engine-results>
</MultiRoutingEngineResults>

Unmarshal+masrhal using the new struct (which then needs to be split):

<rpc-reply>
  <multi-routing-engine-results>
    <multi-routing-engine-item>
      <re-name>fpc0</re-name>
      <system-storage-information>
        <filesystem>
          <filesystem-name>/dev/gpt/junos</filesystem-name>
          <total-blocks format="1.3G"></total-blocks>
          <used-blocks format="814M"></used-blocks>
          <available-blocks format="442M"></available-blocks>
          <used-percent> 65</used-percent>
          <mounted-on>/.mount</mounted-on>
        </filesystem>
      </system-storage-information>
    </multi-routing-engine-item>
    <multi-routing-engine-item>
      <re-name>fpc1</re-name>
      <system-storage-information>
        <filesystem>
          <filesystem-name>/dev/gpt/junos1</filesystem-name>
          <total-blocks format="1.1G"></total-blocks>
          <used-blocks format="810M"></used-blocks>
          <available-blocks format="440M"></available-blocks>
          <used-percent> 65</used-percent>
          <mounted-on>/.mount</mounted-on>
        </filesystem>
      </system-storage-information>
    </multi-routing-engine-item>
  </multi-routing-engine-results>
</rpc-reply>

3. Fixed bug with memory calculation. The value returned is in MiB, but the metric name says we're exporting bytes, so had to multiply by 1024*1024 for conversion

:exclamation: I've tested the code on MX and SRX (multi-routing-engine) and it works ok + tests show everything is properly parsed.