NICMx / FORT-validator

RPKI cache validator
MIT License
49 stars 24 forks source link

fort never leaves validation stage when started with IPv6 server address #35

Closed tias77 closed 4 years ago

tias77 commented 4 years ago

v1.2.1 in docker/alpine on debian10

Normal startup:

2020-06-12T13:46:46.338052584Z DBG: /etc/fort/tal/ripe.tal: }
2020-06-12T13:46:46.372073001Z DBG: /etc/fort/tal/ripe.tal: Deleted 0 deferred certificates.
2020-06-12T13:46:46.372120704Z DBG: /etc/fort/tal/ripe.tal: Deleting 0 stacked x509s.
2020-06-12T13:46:46.372131224Z DBG: /etc/fort/tal/ripe.tal: Deleted 0 metadatas.
2020-06-12T13:46:46.372137942Z DBG: /etc/fort/tal/ripe.tal: }
2020-06-12T13:46:46.372144047Z INF: Validation finished:
2020-06-12T13:46:46.372150649Z INF: - Valid Prefixes: 153246
2020-06-12T13:46:46.372157211Z INF: - Valid Router Keys: 0
2020-06-12T13:46:46.372163743Z INF: - Current serial number is 0.
2020-06-12T13:46:46.372170613Z INF: - Real execution time: 284 secs.
2020-06-12T13:46:46.372177172Z DBG: Waiting for client connections...
2020-06-12T13:46:51.123307698Z INF: Client accepted [ID 4]: <RTR client IPv4 address>

If started with --server.address it never leaves

2020-06-15T12:26:22.520426006Z DBG: /etc/fort/tal/apnic.tal: }
2020-06-15T12:26:22.520433836Z DBG: /etc/fort/tal/apnic.tal: Deleted 0 deferred certificates.
2020-06-15T12:26:22.520440308Z DBG: /etc/fort/tal/apnic.tal: Deleting 0 stacked x509s.
2020-06-15T12:26:22.520450062Z DBG: /etc/fort/tal/apnic.tal: Deleted 0 metadatas.
2020-06-15T12:26:22.520462453Z DBG: /etc/fort/tal/apnic.tal: }

I want fort to accept RTR sessions on both IPv4 and IPv6 sockets, by default it only accepts IPv4 sessions.

pcarana commented 4 years ago

Hi! We're currently reviewing this issue.

So, this only happens when --server.address is an IPv6? I've been testing on an Alpine VM (just to check if the issue it's also there) but there's no error (at least for now).

tias77 commented 4 years ago

Hi Yes, my interpretation was that as soon as i added an IPv6 address to --server.address the validation process never ended. I have now tried it with a clean cache and without any tal, and it finishes as expected and accepts RTR connections on both IPv4 and IPv6. I have also tried it with each tal by itself, and it works fine. When I add all five of them it doesn't finish validating, even after 30min waiting with cpu idling.

The ipv6 thinking was probably just a 'red herring' as it does not seem to come alive now even without --server.address

Investigation continues

pcarana commented 4 years ago

Yeap, same here, after several test this has happened only when using the 5 TALs.

Looking into the container, I've noticed that beside the main process, sometimes there are a couple of child processes alive and doing nothing. This might be due to a fork that we do to execute rsync, but something it's happening that the child processes don't exit in some (still unknown) cases.

I'm still reviewing this, hopefully I'll expect to find what's going on to fix it.

pcarana commented 4 years ago

The related commit comment describes what's happening and how it's fixed.

The fix will be included in the upcoming version: 1.3.0, currently at QA. This version will be released soon, once our QA team approves it.

Gotta say, this issue it's quite peculiar, since we've only seen it in docker + alpine images, so thanks for this issue report :smiley: