RRDP throws error with krill test environment TAL

NLnetLabs / routinator

An RPKI Validator and RTR server written in Rust

https://nlnetlabs.nl/projects/routing/routinator/

BSD 3-Clause "New" or "Revised" License

464 stars 71 forks source link

RRDP throws error with krill test environment TAL #669

Closed cli0 closed 2 years ago

cli0 commented 2 years ago

Hello,

I set up a Krill Test Environment and added the custom tal into the tals/ folder for Routinator. I read that standard Routinator with rustls throws an error for self-signed certificates (CAUsedAsEndEntity) so I compiled Routinator with native-tls instead and ran with it:

$ cargo build --release --features socks,native-tls
$ ./target/release/routinator -vv vrps

however, this version throws a similar error as well:

rsyncing from rsync://krill1.com/ta/.
rsync://krill1.com/ta/: Running command Command { std: "rsync" "--contimeout=10" "--max-size=20000000" "-rltz" "--delete" "rsync://krill1.com/ta/" "/home/clio/.rpki-cache/repository/rsync/krill1.com/ta/", kill_on_drop: false }
Found valid trust anchor rsync://krill1.com/ta/ta.cer. Processing.
RRDP https://krill1.com/rrdp/notification.xml: error sending request for url (https://krill1.com/rrdp/notification.xml): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1924: (self signed certificate)
RRDP https://krill1.com/rrdp/notification.xml: Update failed and there is no current copy.
rsyncing from rsync://krill1.com/repo/.
rsync://krill1.com/repo/: Running command Command { std: "rsync" "--contimeout=10" "--max-size=20000000" "-rltz" "--delete" "rsync://krill1.com/repo/" "/home/clio/.rpki-cache/repository/rsync/krill1.com/repo/", kill_on_drop: false }
RRDP https://pubd.com/rrdp/notification.xml: error sending request for url (https://pubd.com/rrdp/notification.xml): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1924: (self signed certificate)
RRDP https://pubd.com/rrdp/notification.xml: Update failed and there is no current copy.
rsyncing from rsync://pubd.com/repo/.
rsync://pubd.com/repo/: Running command Command { std: "rsync" "--contimeout=10" "--max-size=20000000" "-rltz" "--delete" "rsync://pubd.com/repo/" "/home/clio/.rpki-cache/repository/rsync/pubd.com/repo/", kill_on_drop: false }
ASN,IP Prefix,Max Length,Trust Anchor
AS65000,10.0.0.0/24,24,ta
AS65000,10.0.1.0/24,24,ta

Is there any way to make RRDP work in Routinator for a Krill test environment with an obviously self-signed certificate?

OS is Ubuntu 18.04, Krill 0.9.2 and Routinator 0.10.2

ximon18 commented 2 years ago

Hi @cli0,

I think you can tell Routinator to accept the Krill certificate by copying the Krill certificate to the Routinator host and then using the --rrdp-root-cert <PATH> argument to Routinator to tell it to accept the certificate.

You could also use a real certificate for Krill (or NGINX or similar in front of Krill) rather than a self-signed one.

Ximon

cli0 commented 2 years ago

Hello Ximon, thank you for the fast reply.

I tested the --rrdp-root-cert configuration as follows but I still get an error.

$ ./routinator --rrdp-root-cert "/var/lib/krill/data/ta/ta.cer" -vv vrps
Cannot decode rrdp-root-cert file '/var/lib/krill/data/ta/ta.cer': builder error: error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: CERTIFICATE'
Fatal error. Exiting.

The certificate I pointed to is the cert Krill creates for itself when it is booted in testbed mode. As for getting a real certificate, I am setting up a small scale RPKI infrastructure in my local network and none of the components are internet facing.

ximon18 commented 2 years ago

Hi @cli0,

Try using the $data_dir/ssl/cert.pem file from Krill.

Ximon

cli0 commented 2 years ago

Hello,

The path $data_dir/ssl/cert.pem does not throw the last fatal error anymore but the original error in my initial post certificate verify failed:../ssl/statem/statem_clnt.c:1924: (self signed certificate). My architecture is the following TA -> Testbed (repository is localhost behind a nginx proxy for krill1.com) -> child CA for testbed with repository in another container running krill in testbed mode with repository URI pubd.com.

ximon18 commented 2 years ago

Hi @cli0,

I'll investigate this for you tomorrow when I'm in front of a laptop instead of on my mobile phone.

It's definitely possible. For example the Krill end-to-end test uses Routinator to talk to a Krill instance that uses a self-signed cert (in this case signed with an own CA but still not a real CA).

You can see more about that here:

https://github.com/NLnetLabs/rpki-deploy/blob/main/terraform/krill-e2e-test/lib/docker/relyingparties/routinator/entrypoint.sh

And here:

https://github.com/NLnetLabs/rpki-deploy/blob/main/terraform/krill-e2e-test/lib/docker/nginx/README.txt

Ximon

ximon18 commented 2 years ago

Hi @cli0,

If I follow the procedure at https://github.com/NLnetLabs/rpki-deploy/blob/main/terraform/krill-e2e-test/lib/docker/nginx/README.txt to create the files certbundle.pem, rootCA.crt and krill.key then I can use those to make it work. Here's a full worked example:

$ ISSUER="/C=NL/L=Amsterdam/O=NLnet Labs"
$ SUBJECT="/C=NL/L=Amsterdam/O=NLnet Labs/CN=own.server.com"
$ SAN="DNS:own.server.com"
$ openssl req -new -newkey rsa:4096 -keyout issuer.key -x509 -out issuer.crt  -days 3650 -nodes -subj "$ISSUER"
$ openssl req -new -out subject.csr -newkey rsa:4096 -keyout subject.key -days 3650 -nodes -subj "$SUBJECT"
$ echo "subjectAltName=$SAN" > subject.ext
$ openssl x509 -in subject.csr -req -out subject.crt -extfile subject.ext -CA issuer.crt -CAkey issuer.key -CAcreateserial -days 3650
$ cp issuer.crt rootCA.crt
$ cp subject.key krill.key
$ cat subject.crt issuer.crt > certbundle.pem
$ rm issuer.* subject.*
$ sudo mkdir -p /var/lib/krill/data/ssl/
$ sudo cp krill.key /var/lib/krill/ssl/key.pem 
$ sudo cp certbundle.pem /var/lib/krill/data/ssl/cert.pem
$ sudo chown -R krill: /var/lib/krill

$ cat << EOF | sudo tee -a /etc/hosts
127.0.0.1        own.server.com
EOF
$ sudo apt install -y krill routinator
$ cat << EOF | sudo tee -a /etc/krill.conf
[testbed]
rrdp_base_uri = "https://own.server.com:3000/rrdp/"
rsync_jail = "rsync://own.server.com/repo/"
ta_uri = "https://own.server.com:3000/ta/ta.cer"
ta_aia = "rsync://own.server.com/ta/ta.cer"
EOF
$ sudo -u krill /usr/bin/krill -c /etc/krill.conf

Then in another terminal:

$ routinator-init --tal nlnetlabs-testbed
$ sudo rm /var/lib/routinator/tals/nlnetlabs-testbed.tal
$ curl --insecure https://localhost:3000/ta/ta.tal --output /tmp/ta.tal
$ sudo mv /tmp/ta.tal /var/lib/routinator/tals/
$ sudo -u routinator routinator -c /etc/routinator/routinator.conf -vvvv --allow-dubious-hosts --disable-rsync --rrdp-root-cert rootCA.crt vrps
Found valid trust anchor https://own.server.com:3000/ta/ta.cer. Processing.
New session. Need to get snapshot.
RRDP https://own.server.com:3000/rrdp/notification.xml: updating from snapshot.
RRDP https://own.server.com:3000/rrdp/notification.xml: snapshot update completed.
ASN,IP Prefix,Max Length,Trust Anchor

Ximon

timbru commented 2 years ago

If you go through nxinx then how did you set up ssl for nginx?

My guess is that if you make routinator trust Krill's certificate, but you use another one for nginx - it's still going to reject that. So you should make routinator trust the nginx cert, or you can make nginx use the self-signed certificate and key generated by krill by doing something like:

  # Enable HTTPS using Krill's self-signed certificate and key
  listen 443 ssl;
  ssl_certificate /var/lib/krill/data/ssl/cert.pem;
  ssl_certificate_key /var/lib/krill/data/ssl/key.pem;

(check the paths of course)

Hope this helps Tim

cli0 commented 2 years ago

Hello Ximon and Tim, your suggestions worked. Thank you :) The self signed certificates are now accepted from rustls routinator. However I am encountering a new issue with rsync:


Dec  6 12:45:12 clio routinator[13985]: rsync://server.com/repo/ca1/1/47761E6E7DFBDCABA6C98707848C7114A5421AC9.mft: failed to validate
Dec  6 12:45:12 clio routinator[13985]: rsync://server.com/repo/ca1/1/47761E6E7DFBDCABA6C98707848C7114A5421AC9.mft: No valid manifest found.
Dec  6 12:45:12 clio routinator[13985]: CA for rsync://server.com/repo/ca1/1 rejected, resources marked as unsafe:
Dec  6 12:45:12 clio routinator[13985]:    10.0.0.0/8
Dec  6 12:45:12 clio routinator[13985]:    2001:db8::/32
Dec  6 12:45:12 clio routinator[13985]:    AS65000

After these certificate related changes rsync doesnt seem to work anymore. I tested the path using rsync --list-only and the files are there, on path and reachable via rsync but the validation fails. How do I fix rsync validation here?

Another thing I noticed is that after a while routinator stops caring about the content in the config file. For example if I try to disable rsync using disable-rsync = true in the configuration file, it gets ignored. If I want to limit validation-threads = 1 it gets ignored as well. Both options used to work before but after a computer reboot and a few days later they no longer do (weird).

timbru commented 2 years ago

Hi cli0,

Can you check the timestamps on those files?

This feels like the manifest (47761E6E7DFBDCABA6C98707848C7114A5421AC9.mft) being rejected because it's expired. This could be because krill isn't actually running, or because you are serving a copy of the files generated by krill and those files are not being updated.

cli0 commented 2 years ago

Hello Tim,

I checked the timestamps (via ls -lash because I cant seem to open .mft) and all the files are recently generated. Krill was running without errors as well. I deleted the manifest and restarted Krill just to make sure the files were re-generated. So either the signature was messed up during the certificate switcheroo earlier or, I dont know.

Another thing I noticed is the following, I have 2 publication servers and this issue only happens with the publication server not on the same host as routinator (as you see in the logs). When I disable rsync, the publication server on routinator is queried via RRDP correctly and then stops. The external publication server receives the additional rsync query you see in the logs despite rsync being clearly disabled in the config.

It was all working fine 3 days ago with the exact same setup. Neither Krill nor Routinator were updated to the x.x.3 version.

timbru commented 2 years ago

Krill does not expect files on disk to be deleted outside of its own control. If you really want to force krill to regenerate the manifest then the easiest way to do so is by changing your ROAs - e.g. add one, then delete it.

I am still finishing up the blog post I am writing about our own testbed set up, but I can give you a link to a preview here: https://blog.nlnetlabs.nl/p/e128e35f-9bc5-4302-8b0c-b376f420893e/

cli0 commented 2 years ago

I did as you suggested, added more roas, deleted some in all CAs and ran a new validation cycle. Rsync has been disabled in the configuration file. The ROA for the krill instance running on the same host as routinator was accepted as safe and only RRDP was used for query. The ROAs from the secondary publication server on another host were all flagged as unsafe and for some reason rsync was used to connect with them in addition to RRDP, despite it being disabled in routinator. I still don't understand why rsync is being called forth for only one publication server instance and what validation process is failing here.

Dec  7 15:04:55 trantor routinator[13011]: Starting a validation run.
Dec  7 15:04:55 trantor routinator[13011]: Found valid trust anchor https://krill.com/ta/ta.cer. Processing.
Dec  7 15:04:55 trantor routinator[13011]: https://krill.com/rrdp/notification.xml: Serials: us 60, them 63.
Dec  7 15:04:55 trantor routinator[13011]: RRDP https://krill.com/rrdp/notification.xml: Delta update step (1/3).
Dec  7 15:04:55 trantor routinator[13011]: RRDP https://krill.com/rrdp/notification.xml: Delta update step (2/3).
Dec  7 15:04:55 trantor routinator[13011]: RRDP https://krill.com/rrdp/notification.xml: Delta update step (3/3).
Dec  7 15:04:55 trantor routinator[13011]: RRDP https://krill.com/rrdp/notification.xml: Delta update completed.
Dec  7 15:04:55 trantor routinator[13011]: https://server.com/rrdp/notification.xml: Serials: us 2, them 2.
Dec  7 15:04:55 trantor routinator[13011]: RRDP https://server.com/rrdp/notification.xml: Delta update completed.
Dec  7 15:04:55 trantor routinator[13011]: rsync://server.com/repo/ca1/1/47761E6E7DFBDCABA6C98707848C7114A5421AC9.mft: failed to validate
Dec  7 15:04:55 trantor routinator[13011]: rsync://server.com/repo/ca1/1/47761E6E7DFBDCABA6C98707848C7114A5421AC9.mft: No valid manifest found.
Dec  7 15:04:55 trantor routinator[13011]: CA for rsync://server.com/repo/ca1/1 rejected, resources marked as unsafe:
Dec  7 15:04:55 trantor routinator[13011]:    10.0.0.0/8
Dec  7 15:04:55 trantor routinator[13011]:    2001:db8::/32
Dec  7 15:04:55 trantor routinator[13011]:    AS65000
Dec  7 15:04:55 trantor routinator[13011]: Filtering potentially unsafe VRP (10.0.5.0/24-24, AS65000)
Dec  7 15:04:55 trantor routinator[13011]: Filtering potentially unsafe VRP (10.0.6.0/24-24, AS65000)
Dec  7 15:04:55 trantor routinator[13011]: Filtering potentially unsafe VRP (10.0.2.0/24-24, AS65000)
Dec  7 15:04:55 trantor routinator[13011]: Filtering potentially unsafe VRP (10.0.3.0/24-24, AS65000)
Dec  7 15:04:55 trantor routinator[13011]: Filtering potentially unsafe VRP (10.0.7.0/24-24, AS65000)
Dec  7 15:04:55 trantor routinator[13011]: Validation completed.
Dec  7 15:04:55 trantor routinator[13011]: Summary at 2021-12-07 14:04:55.115320118 UTC
Dec  7 15:04:55 trantor routinator[13011]: ta: 6 verified ROAs, 6 verified VRPs, 5 unsafe VRPs, 6 final VRPs.
Dec  7 15:04:55 trantor routinator[13011]: total: 6 verified ROAs, 6 verified VRPs, 5 unsafe VRPs, 6 final VRPs.
Dec  7 15:04:55 trantor routinator[13011]: New serial is 0.

routinator configuration file

repository-dir = "/var/lib/routinator/rpki-cache"
tal-dir = "/var/lib/routinator/tals"
rtr-listen = ["127.0.0.1:3323"]
http-listen = ["127.0.0.1:8323"]
validation-threads = 2
disable-rsync = true
allow-dubious-hosts = true
log-level = "debug"
rrdp-root-certs = ["/home/clio/local/rootCA.crt",
"/home/clio/server/rootCA.crt"]

ximon18 commented 2 years ago

Good morning @cli0,

Regarding:

Another thing I noticed is that after a while routinator stops caring about the content in the config file.

And:

for some reason rsync was used to connect with them in addition to RRDP, despite it being disabled in routinator.

@cli0: Could you please share how you are running Routinator? For example are you starting it as a systemd service and leaving it running and during this period it "stops caring" or are you manually invoking it periodically or invoking it from cron or something else? If using systemd please share the systemd unit and other relevant systemd files for the Routinator service. If invoking from the command line / cron please share the command being invoked. And do I understand correctly that this "stops caring" issue is with Routinator 0.10.2 on Ubuntu 18.04?

@partim: Are you aware of any issues with Routinator not respecting its configuration file?

Thanks,

Ximon

cli0 commented 2 years ago

Hello Ximon, I am running routinator using systemd. I sometime restart it to force validation and see what happens but always through systemd. This is what the service file looks like:

[Unit]
Description=Routinator 3000
Documentation=man:routinator(1)
After=network.target

[Service]
ExecStart=/usr/bin/routinator --config=/etc/routinator/routinator.conf --syslog server
User=routinator

[Install]
WantedBy=multi-user.target

commands:

$ sudo service routinator restart/stop

Right now I have set rsync as disabled in routinator config but when I make a validation run, I get:

Dec  9 21:09:09 trantor routinator[2807]: Starting a validation run.
Dec  9 21:09:09 trantor routinator[2807]: Found valid trust anchor https://testbed.com/ta/ta.cer. Processing.
Dec  9 21:09:09 trantor routinator[2807]: RRDP https://testbed.com/rrdp/notification.xml: Not modified.
Dec  9 21:09:09 trantor routinator[2807]: RRDP https://pubserver.com/rrdp/notification.xml: Not modified.
Dec  9 21:09:09 trantor routinator[2807]: rsync://pubserver.com/repo/newca/2/8BB5B731D74677695B9E249B25D68C0110A071AF.mft: failed to validate
Dec  9 21:09:09 trantor routinator[2807]: rsync://pubserver.com/repo/newca/2/8BB5B731D74677695B9E249B25D68C0110A071AF.mft: No valid manifest found.
Dec  9 21:09:09 trantor routinator[2807]: CA for rsync://pubserver.com/repo/newca/2 rejected, resources marked as unsafe:
Dec  9 21:09:09 trantor routinator[2807]:    10.0.0.0/8
Dec  9 21:09:09 trantor routinator[2807]:    2001:db8::/32
Dec  9 21:09:09 trantor routinator[2807]:    AS65000
Dec  9 21:09:09 trantor routinator[2807]: Filtering potentially unsafe VRP (10.0.8.0/24-24, AS65000)
Dec  9 21:09:09 trantor routinator[2807]: Validation completed.
Dec  9 21:09:09 trantor routinator[2807]: Summary at 2021-12-09 20:09:09.565357592 UTC
Dec  9 21:09:09 trantor routinator[2807]: ta: 4 verified ROAs, 4 verified VRPs, 1 unsafe VRPs, 4 final VRPs.
Dec  9 21:09:09 trantor routinator[2807]: total: 4 verified ROAs, 4 verified VRPs, 1 unsafe VRPs, 4 final VRPs.
Dec  9 21:09:09 trantor routinator[2807]: New serial is 0.
Dec  9 21:09:09 trantor routinator[2807]: Sending out notifications.

testbed.com is the Krill instance that acts as TA and has the default testbed CA. it also publishes 3 ROAs. pubserver.com is a publication server for newca, a child of testbed whose publication server I migrated to the host pubserver.com and offers 1 ROA(the one flagged as unsafe). I deleted and reconfigured the pubserver from scratch and it was all well for a while but if something happens to the validation of the parent CA (aka testbed) during one cycle for example because rrdp timeouts due to low timeout value, the error cascades down to newca and even when the parent network is ok/timeout is increased and doesnt occur anymore, the error persists indefinitely in the following child CA no matter how many cycles pass. Also rsync is disabled in the config file (looks exactly like in my previous post) and it is still querying pubserver using rsync despite going through rrdp successfully (no errors, not modified xml)

$ rsync --list-only rsync://pubserver.com/repo/newca/2/
drwxr-xr-x          4,096 2021/12/09 21:30:02 .
-rw-r--r--            435 2021/12/09 21:30:02 8BB5B731D74677695B9E249B25D68C0110A071AF.crl
-rw-r--r--          2,388 2021/12/09 21:30:02 8BB5B731D74677695B9E249B25D68C0110A071AF.mft

The manifest is generated by the app, no manual editing whatsoever. As I am writing this it is only a few minutes old.

The platforms are Ubuntu 18.04, Krill 0.9.2 and Routinator 0.10.2. And I have made sure every app, rsync and krill are running correctly when routinator throws these errors. All 3 apps I run using systemd.

cli0 commented 2 years ago

Just as a quick update, this problem arises also when routinator and CAs are operating normally without issues or lagging. After a few hours rsync will throw an error that it can't validate the manifest despite the manifest being untouched by me/continuously updated by krill, available via rsync, RRDP was previously successful - oh and rsync is disabled in the Routinator configs. This problem does not arise immediately but a few hours after the system is running fine, and it only affects those CAs whose publication servers are externally located (via migration). It flags VRPs as unsafe but not as invalid and the unsafe-vrps = "accept" does not suppress the warning, although I am not entirely sure what this setting is supposed to do in practice.

Dec 17 08:30:53 clio routinator[1462534]: https://server.com/rrdp/notification.xml: Serials: us 2, them 2.
Dec 17 08:30:53 clio routinator[1462534]: RRDP https://server.com/rrdp/notification.xml: Delta update completed.
Dec 17 08:30:53 clio routinator[1462534]: rsync://server.com/repo/ca/0/9492043940A2E3E9CFA7912107996984F20674CD.mft: failed to validate
Dec 17 08:30:53 clio routinator[1462534]: rsync://server.com/repo/ca/0/9492043940A2E3E9CFA7912107996984F20674CD.mft: No valid manifest found.
Dec 17 08:30:53 clio routinator[1462534]: CA for rsync://server.com/repo/ca/0 rejected, resources marked as unsafe:
Dec 17 08:30:53 clio routinator[1462534]:    10.0.0.0/8
Dec 17 08:30:53 clio routinator[1462534]:    2001:db8::/32
Dec 17 08:30:53 clio routinator[1462534]:    AS65000
Dec 17 08:30:53 clio routinator[1462534]: Filtering potentially unsafe VRP (10.0.1.0/24-24, AS65000)
Dec 17 08:30:53 clio routinator[1462534]: Filtering potentially unsafe VRP (10.0.5.0/24-24, AS65000)
Dec 17 08:30:53 clio routinator[1462534]: Filtering potentially unsafe VRP (10.0.2.0/24-24, AS65000)
Dec 17 08:30:53 clio routinator[1462534]: Filtering potentially unsafe VRP (10.0.3.0/24-24, AS65000)
Dec 17 08:30:53 clio routinator[1462534]: Filtering potentially unsafe VRP (10.0.4.0/24-24, AS65000)
Dec 17 08:30:53 clio routinator[1462534]: Validation completed.
Dec 17 08:30:53 clio routinator[1462534]: Summary at 2021-12-17 07:30:53.066952800 UTC
Dec 17 08:30:53 clio routinator[1462534]: ta: 5 verified ROAs, 5 verified VRPs, 5 unsafe VRPs, 5 final VRPs.
Dec 17 08:30:53 clio routinator[1462534]: total: 5 verified ROAs, 5 verified VRPs, 5 unsafe VRPs, 5 final VRPs.
Dec 17 08:30:53 clio routinator[1462534]: New serial is 0.

partim commented 2 years ago

Is this the first validation run rejecting this manifest? I am asking because the first line (“Serials: us 2, them 2”) suggests that the data RRDP data from server.com has not changed at all. Then in the third line, Routinator rejects the manifest it has received earlier from the RRDP server (don’t mind the rsync mentioned there, this is just the “name” of the manifest; it wasn’t actually received via rsync) and tries to fall back to using an older version of the manifest that had passed validation. But it doesn’t have one. So, it must have rejected the manifest before.

The “filtering potentially usafe VRP” message is a logging bug – it doesn’t actually filter anything. I created #671 for that.

cli0 commented 2 years ago

Hello Partim,

Thank you very much for the information, so the "rsync" part is just an address naming convention, rsync is not being used so it is not the protocol failing. That is very good to know!

As for the error, the logs above are not the first validation, this was just the most recent one. I checked the logs for the very first instance this error occurred and there was no difference. Still, below you will find 2 logging snippets: First snippet is the successful validation and the second snippet is the follow up validation 10 minutes later when the error occurs (disclaimer: i did nothing to that PP in the meantime - no added roas, no restarts, no changes):

Dec 16 20:25:49 clio routinator[2132733]: https://server.com/rrdp/notification.xml: Serials: us 0, them 0.
Dec 16 20:25:49 clio routinator[2132733]: RRDP https://server.com/rrdp/notification.xml: Delta update completed.

Dec 16 20:35:50 clio routinator[2132733]: https://server.com/rrdp/notification.xml: Serials: us 0, them 0.
Dec 16 20:35:50 clio routinator[2132733]: RRDP https://server.com/rrdp/notification.xml: Delta update completed.
Dec 16 20:35:50 clio routinator[2132733]: rsync://server.com/repo/ca/0/9492043940A2E3E9CFA7912107996984F20674CD.mft: failed to validate
Dec 16 20:35:50 clio routinator[2132733]: CA for rsync://server.com/repo/ca/0 rejected, resources marked as unsafe:
Dec 16 20:35:50 clio routinator[2132733]:    10.0.0.0/8
Dec 16 20:35:50 clio routinator[2132733]:    2001:db8::/32
Dec 16 20:35:50 clio routinator[2132733]:    AS65000

In between the two runs from 20:25 to 20:35 there were also no OS related errors in either of the hosts running routinator and the publication server. Why would the manifest validation suddenly not validate an external publication server without any warning or prompting?

Edit: Could it have something to do with the certificates we use for the Krill instances i.e. the self signed certificates generated with this method https://github.com/NLnetLabs/routinator/issues/669#issuecomment-984131724 ? But that still doesn't explain why it works for a few hours and then it breaks down unexpectedly...

UPDATE

I added a ROA under the CA for which server.com acts as Publication Server (Migration setup). The ROA is recognized and validated after a validation run but the manifest file in server.com seems to not be updated. This is the validation log:

Dec 17 12:36:55 clio routinator[3920958]: https://server.com/rrdp/notification.xml: Serials: us 2, them 2.
Dec 17 12:36:55 clio routinator[3920958]: RRDP https://server.com/rrdp/notification.xml: Delta update completed.
Dec 17 12:36:55 clio routinator[3920958]: rsync://server.com/repo/ca/0/9492043940A2E3E9CFA7912107996984F20674CD.mft: failed to validate
Dec 17 12:36:55 clio routinator[3920958]: rsync://server.com/repo/ca/0/9492043940A2E3E9CFA7912107996984F20674CD.mft: No valid manifest found.
Dec 17 12:36:55 clio routinator[3920958]: CA for rsync://server.com/repo/ca/0 rejected, resources marked as unsafe:
Dec 17 12:36:55 clio routinator[3920958]:    10.0.0.0/8
Dec 17 12:36:55 clio routinator[3920958]:    2001:db8::/32
Dec 17 12:36:55 clio routinator[3920958]:    AS65000
Dec 17 12:36:55 clio routinator[3920958]: Validation completed.
Dec 17 12:36:55 clio routinator[3920958]: Summary at 2021-12-17 11:36:55.215058830 UTC
Dec 17 12:36:55 clio routinator[3920958]: ta: 7 verified ROAs, 7 verified VRPs, 7 unsafe VRPs, 7 final VRPs.
Dec 17 12:36:55 clio routinator[3920958]: total: 7 verified ROAs, 7 verified VRPs, 7 unsafe VRPs, 7 final VRPs.
Dec 17 12:36:55 clio routinator[3920958]: New serial is 0.

If the manifest file is not updated or validated, then how is the new ROA being verified? Shouldn't it be rejected if all checks are not satisfied not even once? This doesn't look right.

configuration file of the Krill instance acting as Publication Server for my CA

data_dir=...
log_type=...
admin_token=...
service_uri = "https://server.com/"
repo_enabled = true
bgp_risdumps_enabled = false
[testbed]
rrdp_base_uri = "https://server.com/rrdp/"
rsync_jail = "rsync://server.com/repo/"
ta_aia = "rsync://server.com/ta/ta.cer"
ta_uri = "https://server.com/ta/ta.cer"

partim commented 2 years ago

Apologies for the delayed answer – holiday season and all that.

This looks a bit like the manifest’s certificate expired in between the 20:25 and 20:35 validation runs. I can’t think of anything else that suddenly would make the manifest invalid. The only other option would be that something changes in the parent CA – am I correct in assuming that server.com lives under testbed.com?

That could also be an explanation for actually having the ROAs: Is it possible that you created two CAs for server.com and the one that errors out is actually an older experiment that just happens to still be around?

You could try and debug all this, but maybe it’s a better idea to just wipe everything and start from scratch?

cli0 commented 2 years ago

Hello, I dearly appreciate any response, even if late :)

I don't understand exactly what you mean by "living under testbed.com". The setup is as follows: server.com is a publication server running in a seperate VM acting as such for a certificate authority called ca which is the child CA of testbed CA (as it is officially created when we set up Krill testbed environment). I hope this answers your question :)

Is it possible that you created two CAs for server.com and the one that errors out is actually an older experiment that just happens to still be around?

Nop. I have very specifically delegated the VM to act as publication server for only 1 CA.

So what this means is that after doing a CA Migration, the new publication server does not renew the manifest and it expires. I think this might be a software issue in Krill, no? And I have tried everything too, bar deleting the CA entirely and reconfiguring from scratch I know of no other fix. Wiping out routinator cache or even reinstalling it doesn't work. And this error always appears a few hours post migration, it is not a one time fluke. Plus adding new ROAs still works and Routinator recognizes them, despite the manifest being expired (?). @timbru is it possible for this to be a migration issue? After a migration Krill/publication server does not reissue a manifest after it expires. I have observed it happen without fail for every migrated CA hours post migration. Never for the first CA (testbed) whose publication server runs on the same Krill instance (ie wasn't migrated)

timbru commented 2 years ago

Hi @cli0,

So, I guess that by 'a CA Migration' you mean that you migrated the CA 'ca' to use a new repository?

Documentation was recently updated for this, and this is now here: https://krill.docs.nlnetlabs.nl/en/stable/ca-migrate-repo.html

Did you finish the migration? I.e. the first part of the migration will just result in an empty manifest (no ROAs etc) and CRL being published in the new location. It should keep being re-newed though. If this is not working then this would indeed be a bug.. I will dive into the code now..

In the meantime could you try `krillc keyroll activate' to finish the migration?

That should result in the ROAs being published in the new location, and the old MFT/CRL and roas being removed from the old location - and the certificate for the old key should be revoked by the parent (testbed).

cli0 commented 2 years ago

Hi,

Thank you for the reply. I see the page has changed a little and there is no mentioning of krillc keyroll init . Is it no longer necessary? The way I did my migration weeks ago was the following:

krillc keyroll init --ca ca
krillc keyroll activate --ca ca
<get publisher request>
<create new publisher in the new publication server at server.com and get repository response>
krillc repo configure --ca ca --response <repository_response>

After re-running krillc keyroll activate again the error no longer appears. I misunderstood the order of commands and the migration was incomplete apparently. Activate should have been the last command, right? Thats why the manifest wasnt being re-issued after expiry. I apologize for the confusion and hope you haven't dived into the source code yet :)

timbru commented 2 years ago

So, the first keyroll is not required to do the repository migration - it was probably confusing that both the key roll and repo migration process were documented on the same page and the migration process uses its own keyroll to achieve the migration.. essentially it creates a new key for the new repo and then rolls to it.

Perhaps I should add a CLI alias or explicit key state for this though - something like krillc repo finalize would probably be more intuitive.

All that said.. I have not found this issue because in my tests I finish the key rolls quickly, but the manifests and CRLs should keep being updated while a roll is in process - so I created a bug in the krill project for this:

https://github.com/NLnetLabs/krill/issues/749

Will close this one for now then, thank you for reporting and persevering!