NLnetLabs / rtrtr

An RPKI Data Proxy
https://nlnetlabs.nl/projects/routing/rtrtr/
BSD 3-Clause "New" or "Revised" License
30 stars 7 forks source link

slurm file processing not working if an entry contains a wrong network address #127

Open DonOtuseGH opened 1 week ago

DonOtuseGH commented 1 week ago

Hello,

we are running some RTRTR instances on Kubernetes clusters using a custom image:

RTRTR usually runs against our own Routinator instance, but to demonstrate the issue, the following config can be used as well:

rtrtr.conf:

log_level = "debug"
log_target = "stderr"
http-listen = ["0.0.0.0:8323"]
[units.json]
type = "json"
uri = "https://console.rpki-client.org/vrps.json"
refresh = 60
[units.slurm]
type = "slurm"
source = "json"
files = [ "/home/rtrtr/slurm.json" ]
[targets.rtr]
type = "rtr"
listen = [ "0.0.0.0:3323" ]
unit = "slurm"
client-metrics = true
[targets.http]
type = "http"
path = "/json"
format = "json"
unit = "slurm"

We realized that RTRTR does not start correctly, does not process the slurm file at all and does not give an error message if the slurm file contains an invalid network address in the prefix value of prefixAssertions.

slurm.json with wrong entry (10.10.10.164/27 is invalid/wrong, should be 10.10.10.160/27)

{
  "slurmVersion": 1,
  "validationOutputFilters": {
    "prefixFilters": [],
    "bgpsecFilters": []
  },
  "locallyAddedAssertions": {
    "prefixAssertions": [
      {
        "asn": 64546,
        "prefix": "192.168.255.0/24",
        "maxPrefixLength": 24,
        "comment": "RTR Health Check"
      },
      {
        "asn": 65535,
        "prefix": "10.10.10.164/27",
        "maxPrefixLength": 32
      }
    ],
    "bgpsecAssertions": []
  }
}

rtrtr log doesn't show anything about the issue, no slurm file processing, no target information...

[DEBUG] HTTP server listening on 0.0.0.0:8323
[DEBUG] Target http: link status: healthy
[DEBUG] starting new connection: https://console.rpki-client.org/
[DEBUG] RTR: Got reset query.
[DEBUG] Unit json: successfully updated.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] Unit json: successfully updated.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] Unit json: update without changes.
...

local target isn't working (expected result according to the missing log entries from above):

$ rtrclient -e -t csv -o /dev/stdout tcp 127.0.0.1 3323 2>/dev/null | wc -l
===> times out

Of course everything is working fine, if we correct the network address of the prefix to a valid one:

slurm.json with valid entries:

{
  "slurmVersion": 1,
  "validationOutputFilters": {
    "prefixFilters": [],
    "bgpsecFilters": []
  },
  "locallyAddedAssertions": {
    "prefixAssertions": [
      {
        "asn": 64546,
        "prefix": "192.168.255.0/24",
        "maxPrefixLength": 24,
        "comment": "RTR Health Check"
      },
      {
        "asn": 65535,
        "prefix": "10.10.10.160/27",
        "maxPrefixLength": 32
      }
    ],
    "bgpsecAssertions": []
  }
}

rtrtr log looks as expected:

[DEBUG] HTTP server listening on 0.0.0.0:8323
[DEBUG] Target http: link status: healthy
[DEBUG] starting new connection: https://console.rpki-client.org/
[DEBUG] Updated Slurm file /home/rtrtr/slurm.json
[DEBUG] Unit json: successfully updated.
[DEBUG] Unit slurm: file /home/rtrtr/slurm.json: added 2, removed 0.
[DEBUG] Target rtr: Got update (615244 entries)
[DEBUG] Target http: Got update (615244 entries)
[DEBUG] Target http: link status: healthy
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] Unit json: successfully updated.
[DEBUG] Unit slurm: file /home/rtrtr/slurm.json: added 2, removed 0.
[DEBUG] Target rtr: Got update (615246 entries)
[DEBUG] Target http: Got update (615246 entries)
[DEBUG] Target http: link status: healthy
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
[DEBUG] RTR: Got reset query.
...

local target gives the correct count of VRPs:

$ rtrclient -e -t csv -o /dev/stdout tcp 127.0.0.1 3323 2>/dev/null | wc -l
615248
DonOtuseGH commented 1 week ago

What we would expect

It would be great to have an error message in the log, that there's something wrong, while processing the slurm file. Of course it could be helpful to show the wrong/invalid entries in the log as well. This would simplify troubleshooting considerably, especially if the slurm file contains several hundred locallyAddedAssertions ;-)