NICMx / FORT-validator

RPKI cache validator
MIT License
49 stars 24 forks source link

Some ROAs are not processes/send via RTR or dumped to CSV #36

Closed afpd closed 4 years ago

afpd commented 4 years ago

ROA from rsync://rpki.ripe.net/repository/DEFAULT/87/f7c4a2-b606-4292-b9f7-fa3c4ef5edf6/1/5WObKiUO8AYk4h7mlYN_9Mxhv8s.roa file never signaled via RTR or dumped to CSV. The same ROA processed by routinator 3000 without issues and produces "81.27.160.0/20-24 55002"

The same subnet with different origin from the file rsync://rpki.ripe.net/repository/DEFAULT/87/f7c4a2-b606-4292-b9f7-fa3c4ef5edf6/1/hHDR3YRXyWPkC3ahQoAACw39Gto.roa processed by FORT-validator without issues and gives "81.27.160.0/20-20 12611"

pcarana commented 4 years ago

Hi there! The issue it's still happening?

I've run a quick test at my local machine, and I was able to find both prefixes at the CSV file and also announced via RTR. Here's an informational log (fort_info.log) of one of the tests, and the following image is just a basic search at the CSV output (the modified date is from my current timezone):

Screenshot from 2020-06-20 13-47-58

So, at least for now, the ROAs seem to be OK, maybe it was a temporal issue?

afpd commented 4 years ago

fort --version fort 1.2.1 Before restart: fort 11310 27.8 5.9 3526956 492532 ? Ssl Jun12 3182:47 /usr/bin/fort --configuration-file /etc/fort/config.json

In the logs I fond:

Jun 16 07:06:31 rpki-v2 fort[11310]: ERR: /etc/tals/apnic.tal: TAL file does not point to a certificate. (Expected .cer, got 'https://tal.apnic.net/apnic.tal') Jun 16 07:10:09 rpki-v2 fort[11310]: WRN: Validation from TAL '/etc/tals/apnic.tal' yielded error, discarding any other validation results.

When I tried to restart, I got

Jun 20 20:48:48 rpki-v2 fort[4332]: WRN: Validation from TAL '/etc/tals/apnic.tal' yielded error, discarding any other validation results. Jun 20 20:48:48 rpki-v2 fort[4332]: ERR: First validation wasn't successful.

and tal indeed was like that: cat /etc/tals/apnic.tal https://tal.apnic.net/apnic.tal rsync://rpki.apnic.net/repository/apnic-rpki-root-iana-origin.cer

while first line should be https://tal.apnic.net/apnic.cer or just only rsync://rpki.apnic.net/repository/apnic-rpki-root-iana-origin.cer

TAL is from the Debian's package "rpki-trust-anchors 20200610-1" and fort-validator initially was started with it at June 12th.

After restart with corrected tal: fort 4332 93.7 0.3 644712 27680 ? Ssl 20:43 0:40 /usr/bin/fort --configuration-file /etc/fort/config.json

grep 81.27.160.0 fort-roas.csv AS55002,81.27.160.0/20,24 AS12611,81.27.160.0/20,20

At that point I'm surprised how fort-validator worked since June12 if tal was incorrect? If it was incorrect should fort-validator stop listing the port? Looks like it just stopped refreshing the ROA data. While afterr restart it just quited after "ERR: First validation wasn't successful" which is expected. I'm not sure what should be the behavior if tal is broken/wrong/corrupted during the operations.

pcarana commented 4 years ago

Well, there are two points from your last comment:

  1. The second TAL URI was mistakenly ignored (in this case, and rsync URI), when it should be considered to sync the trust anchor certificate. This issue will be fixed at the upcoming version: 1.3.0
  2. What should happen after a TAL can't processed.

Just as you saw during the restart: if the first validation isn't successful, FORT validator will stop its execution; there's no point to keep it alive if it doesn't have the full ROAs base.

Assuming that the first validation was successful, and one of the TALs has an error (maybe it was updated), the scenario is just as the one you experienced: FORT validator will log the error, will discard that validation run results, and the ROAs DB won't be modified.

Why does FORT validator has such behavior? Instead of dropping the whole ROAs DB, or updating it with "partial data" (i.e. data from 4 of 5 TALs since one had an error), or even stop the execution, it keeps the last valid DB to avoid a probably large update via RTR where some previously valid ROAs will be discarded (those related to the TAL that couldn't be sync'd). In most of the cases, the error may be temporary, and yes: we leave some responsibility to the operator to check the error logs.

Currently there's a downside with the logs, FORT validator isn't "categorizing" the validation logs (those resulting from the RPKI objects validation) and the operation logs (those that can be of interest to the operator). So, an important error just as the one you experienced, is being lost in a bunch of warnings and errors. The good news is that this message categorization will be available at v1.3.0 (hopefully soon to be released).

Summary: we prefer to keep the last valid DB, and warn the operator about the error so that an action (if needed) can be taken.

pcarana commented 4 years ago
  1. The second TAL URI was mistakenly ignored (in this case, and rsync URI), when it should be considered to sync the trust anchor certificate. This issue will be fixed at the upcoming version: 1.3.0
  2. What should happen after a TAL can't processed.

Closing: both subjects are handled at v1.3.0