anima-wg / constrained-join-proxy

1 stars 2 forks source link

Registrar aliveness problem and text #65

Open toerless opened 1 month ago

toerless commented 1 month ago

Ran across a generic issue (IMHO) of the stateless proxy while updating BRSKI discovery... I think:

Example:

A proxy using whatever form of discovery discovers e.g.: 2 possible registrar (does not need to be anything advanced from BRSKI discovery. Whatever minimum discovery we have in the stateless proxy draft will suffice).

When pledge sends packet to proxy forwards packet to the registrar it selected. e.g.: registrar 1

Unfortunately, that registrar is dead. For example DNS-SD may be configured for several minutes DNS holdtime, for efficiency reasons, so whenever during this period the server dies, the client has to discover this through its connection attempts. And conclude its dead - and then select the next-best server.

Here is some initial text thought to solve this for the draft:

When a proxy selects one out of more than two possible registrars through some discovery mechanism including but not limited to the ones described in this specification, that registrar may not be alive/responsive because the discovered information is stale. For example, in DNS-SD the TTL of information may be minutes old.

Proxies SHOULD automatically and timely switch to a next-best registar when they observe a non-responsive registrar, and have discovered alternative registrar(s).

In stateful mode, registrar unresponsiveness can be discovered by timeouts of TCP connection connection attempts, and the proxy can connect to the next best discovered registrar, transparent to the pledge. If the connection was already established and the registrar becomes unresponsible the proxy MUST close the connection from the pledge. When the pledge then re-attempts to connect, the proxy needs to connect to the next-best discovered registrar.

In stateless mode, aliveness SHOULD be supported using a stateless method. For example, the proxy can maintain a count of packets forwarded to the discovered registrar within the last 10 seconds. If no packets are received back from the registrar for 3 consecutive periods in which the proxy did forward packets to the registrar, then the registrar should be considered to be unresponsive and the next best registrar should be used.

ICMP/ICMPv6 messages from the registrar indicating non-responsiveness of the registrar (such as port unreachable) SHOULD equally lead to using the next best registrar.

EskoDijk commented 1 month ago

Some notes here: