cloudflare / gortr

The RPKI-to-Router server used at Cloudflare
https://rpki.cloudflare.com
BSD 3-Clause "New" or "Revised" License
309 stars 39 forks source link

rtrdump: serial missing from metadata if zero #83

Closed lukastribus closed 1 year ago

lukastribus commented 3 years ago

Hello,

setup is:

VRP's: https://rpki.cloudflare.com/rpki.json Server: GoRTR v0.14.6 Client: RTRdump 0.11.0 (really is v0.14.6 but the version string was not updated)

When GoRTR was just started, and the serial is zero:

systemd[1]: Started GoRTR RPKI to router server.
gortr[53493]: time="2020-10-24T23:00:53+02:00" level=info msg="New update (188879 uniques, 188879 total prefixes). 0 bytes. Updating sha256 hash  -> dbbbbd2701d7d695435842bf91729e4dad2cee533d8611092530de20cf15d3ce"
gortr[53493]: time="2020-10-24T23:00:54+02:00" level=info msg="Updated added, new serial 0"
gortr[53493]: time="2020-10-24T23:00:54+02:00" level=info msg="GoRTR Server started (sessionID:62387, refresh:3600, retry:600, expire:7200)"
gortr[53493]: time="2020-10-24T23:01:01+02:00" level=info msg="Accepted tcp connection from 10.0.0.120:36358 (1/0)"
gortr[53493]: time="2020-10-24T23:01:03+02:00" level=info msg="Disconnecting client 10.0.0.120:36358 (v1) / Serial: 0"
lukas@ubuntu20:~$ jq '.["metadata"]' output.json
{
  "counts": 188879,
  "generated": 1603573263,
  "valid": 1603576863
}
lukas@ubuntu20:~$

I would expect to see "serial": 0 in JSON, instead of omitting the serial.

Only when the serial bumps to non-zero does the JSON metadata output from the rtrdump contain it:

gortr[53493]: time="2020-10-24T23:10:54+02:00" level=info msg="HTTP 304 Not modified for https://rpki.cloudflare.com/rpki.json"
gortr[53493]: time="2020-10-24T23:20:49+02:00" level=info msg="New update (188878 uniques, 188878 total prefixes). 0 bytes. Updating sha256 hash dbbbbd2701d7d695435842bf91729e4dad2cee533d8611092530de20cf15d3ce -> 4ac2429ec0831c8877e58b8f35d2f4e491e562463d6882f4f5a8cbec5dc53073"
gortr[53493]: time="2020-10-24T23:20:51+02:00" level=info msg="Updated added, new serial 1"
gortr[53493]: time="2020-10-24T23:21:21+02:00" level=info msg="Accepted tcp connection from 10.0.0.120:36542 (1/0)"
gortr[53493]: time="2020-10-24T23:21:23+02:00" level=info msg="Disconnecting client 10.0.0.120:36542 (v1) / Serial: 0
lukas@ubuntu20:~$ jq '.["metadata"]' output.json
{
  "counts": 188878,
  "generated": 1603574483,
  "valid": 1603578083,
  "serial": 1
}
lukas@ubuntu20:~$

This is making monitoring serial updates more difficult.

Could you make rtrdump emit "serial": 0 in these cases instead if omitting it?

lspgn commented 3 years ago

Thank you for raising the issue. I'll have a look at making the version number dynamic. What happens here is Golang has an omitempty setting which, when the value is zero, it is considered as default. I may be able to remove this flag. What are you using to monitor serial updates?

lukastribus commented 3 years ago

I'm currently writing and testing a script that handles rtrdump and jq, compares serial and data with a previous run, and returns with different exit codes, infos on stdout and errors on stderr.

I should then be able to use the script both in custom situations (cronjob triggered local health-check and sending results to something like healthcheck.io) and also as a nagios plugin.

lspgn commented 3 years ago

Following #82, I am considering making something like this (that would output to Prometheus or run as one-off with diff). Your solution seems quite nice. Although I don't think jq is able to handle default values: I will take a look at removing the omitempty then.

lukastribus commented 3 years ago

I'm accessing '.["metadata"]["serial"] | numbers', if my script doesn't see anything (in case of the serial being omitted), then the script currently assumes "serial": 0 :

lukas@rpki-bz:~$ jq '.["metadata"]["serial"] | numbers' output.json
1
lukas@rpki-bz:~$ jq '.["metadata"]["serial"] | numbers' output-noserial.json
lukas@rpki-bz:~$

I don't understand what you mean by saying jq may not be being able to handle default values?

lspgn commented 3 years ago

I don't understand what you mean by saying jq may not be being able to handle default values?

`jq '.["metadata"]["serial"] | numbers' does not return zero but empty: you have to assume zero in the subsequent script, correct?

lukastribus commented 3 years ago

Yes, I have to assume zero in this case, so I'm unable to tell the difference between incomplete JSON and "just" missing serial.

If rtrdump would output "serial": 0 then I don't have to assume zero and can handle invalid/incomplete JSON also.

lspgn commented 3 years ago

Ah to handle the case where the rtr session would break and rtrdump outputs a partial file?

lukastribus commented 3 years ago

Well I didn't think about possible failure scenarios too much to be honest, I'm just trying to make it as robust as possible, so that's why I dislike the idea that I have to assume zero when the parameter is missing. I think if 0 is a proper serial number valid as per the RTR specification, it should appear in the JSON explicitly.

In reality, jq behaves differently with a truncated JSON file, so is already detected today:

lukas@rpki-bz:~$ jq '.["metadata"]["serial"] | numbers' output-noserial.json
lukas@rpki-bz:~$ echo $?
0
lukas@rpki-bz:~$ jq '.["metadata"]["serial"] | numbers' output-truncated.json
parse error: Unfinished string at EOF at line 1, column 11451471
lukas@rpki-bz:~$ echo $?
4
lukas@rpki-bz:~$

I just feel like I would be more robust to remove the second part (although even today a truncated json would lead to jq return non-zero, and the die() error function would stop it all):

# read serial from JSON
RTR_SERIAL=$($JQ '.["metadata"]["serial"] | numbers' "$RTRCHK_RTRDUMP_FILE") || die

# if not found, assume zero
[ -z "$RTR_SERIAL" ] && RTR_SERIAL=0