bgp / stayrtr

RPKI-To-Router server implementation in Go
BSD 3-Clause "New" or "Revised" License
85 stars 13 forks source link

Adjust handling of version-session_id-serial touple #110

Open cjeker opened 6 months ago

cjeker commented 6 months ago

RFC 8210 has this in Section 5.1

Note that sessions are specific to a particular protocol version.
That is, if a cache server supports multiple versions of this protocol,
happens to use the same Session ID value for multiple protocol
versions, and further happens to use the same Serial Number
values for two or more sessions using the same Session ID but
different Protocol Version values, the Serial Numbers are not
commensurate. The full test for whether Serial Numbers are
commensurate requires comparing Protocol Version, Session ID,
and Serial Number. To reduce the risk of confusion, cache servers
SHOULD NOT use the same Session ID across multiple protocol
versions, but even if they do, routers MUST treat sessions with
different Protocol Version fields as separate sessions even if they
do happen to have the same Session ID.

This changes the code to create session ids and spacing them 100 apart from each other. With this the Serial Query will fail for a system that does a session down/upgrade.

job commented 6 months ago

So we were not reading the RFC carefully enough all this time?

I guess this mitigates the problem I outlined to some degree?

cjeker commented 6 months ago

Yes but too be honest it did not really matter until now since nothing uses router keys which was the data addition in v1.

I'm not surprised; the RTR RFC are IMO sub-standard since stuff is hidden in unrelated sections that people just glance over. Also a lot is left out giving implementations too much wiggle room for unexpected behaviour.

ties commented 6 months ago

I'm not surprised; the RTR RFC are IMO sub-standard since stuff is hidden in unrelated sections that people just glance over. Also a lot is left out giving implementations too much wiggle room for unexpected behaviour.

And it has some choices where the reason may have been abundantly clear when written if you went through the process, but are vague now. This is one.

When I read this originally another idea came up though: if you can extract a (session, revision) tuple in the metadata of an RP, you can have multiple rtr servers be in sync when using the same data as input.

cjeker commented 6 months ago

Even if the session_id is stored in the metadata it would require one session id by RTR version supported. This metadata is normally generated by a tool like rpki-client and would require rpki-client to run with a well defined session_id so that a stable number can be put into the metadata. In short this just moves the problem but does not solve it.

Tu run with a fixed session_id it is easier to use a command line argument in stayrtr. Even then I think the system should create 3 session_ids based on this initial number. The reason for this is to prevent a router to keep stale data on a session downgrade. E.g. router starts with version 2 but then opens a new session downgraded to version 1. If the cache is not flushed the ASPA entries would stick around forever.

One could argue that the version-session_id tuple should be used instead but that requires the server to keep client information around after a RTR session is closed. This is a lot of complexity that can be avoided by using different session_ids per version.

ties commented 6 months ago

Even if the session_id is stored in the metadata it would require one session id by RTR version supported. This metadata is normally generated by a tool like rpki-client and would require rpki-client to run with a well defined session_id so that a stable number can be put into the metadata. In short this just moves the problem but does not solve it.

It does move the problem: But you can have n instances reading their (rpki-client-disk-state-uuid, sequence-number) from the same source. And then they would not have historic deltas unless they were running. But I think this architecture would work w/ tcp loadbalancers.

Even then I think the system should create 3 session_ids based on this initial number. The reason for this is to prevent a router to keep stale data on a session downgrade. E.g. router starts with version 2 but > then opens a new session downgraded to version 1. If the cache is not flushed the ASPA entries would stick around forever.

I think this is a legitimate improvement, LGTM.