InterNetNews / inn

INN (InterNetNews) Usenet server
https://www.isc.org/othersoftware/#INN
Other
68 stars 13 forks source link

Add randomness in Message-IDs #245

Open Julien-Elie opened 1 year ago

Julien-Elie commented 1 year ago

Improve how Message-IDs are generated so as not to have (rare, if any) duplicates when the right-hand side is the same, especially on virtual hosts. Message-IDs are currently based on the date, nnrpd's PID and a global static counter incrementing at each post. An improvement could be to also add random bytes from RAND_bytes() if OpenSSL support is available (which is most often the case).

Also, use a 64-character alphabet (with valid characters) instead of the current 32-character alphabet.

If possible, the new Message-ID should be shorter or have the same length as the legacy Message-ID.

A timestamp may be added before the randomness to reduce the chances of a collision to effectively nothing because every second you move into a new message ID space. That takes for instance the date as a number (20220922153000) and represents it in 6 bytes, which is good enough for the next 10 000 years. And then add 8 bytes of randomness.

Julien-Elie commented 3 months ago

The current Radix32() function called by GenerateMessageID() assumes that time_t fits into 32 bits. Hopefully we have time before it no longer does (next century). Switching to a new Radix64() function or like will fix that.

pmetzger commented 3 months ago

I note that UUIDs https://en.wikipedia.org/wiki/Universally_unique_identifier are more or less intended for such purposes. At one time, I'd have said that they were overkill, but machines are fast enough now that it's no longer a problem.

Julien-Elie commented 3 months ago

When we last spoke about Message-IDs on news.software.nntp, people tended to prefer shorter ones, which looked nicer. It is very subjective and a matter of preference. INN currently generates 14 bytes (on most cases) for the left-hand side of Message-IDs. UUIDs are 128-bit long, and will then need more than 14 bytes to be represented. I bet we'll need an inn.conf option to let people configure they preference 😅

pmetzger commented 3 months ago

Whether you use shorter or longer, the general ideas should be similar. 14 bytes of Base64 is 84 bits, and given that the hostname is already globally unique, an algorithm similar to the UUID algorithm should work quite well. An actual UUID would only require 22 characters of Base64 of course.

Julien-Elie commented 3 months ago

Yes, I see. Note that Message-IDs MUST contain @ (in mails and netnews articles) so maybe one of the - in the generated UUID string should be changed to @. For instance:

Message-ID: <6ba7b810-9dad-11d1@80b4-00c04fd430c8>

In RFC 9562 defining UUIDs, the hostname appears in a few versions like UUIDv6. The related 48-bit representation could be put at the right-hand side of the @. We could also generate implementation-specific UUIDs with UUIDv8 😉 though 128-bit seems overkill for our use.

My main concern about using UUIDs is the fact that they look more impersonal than Message-IDs with clear domain names like:

Message-ID: <uri8vt$3jl86$1@paganini.bofh.team>
Message-ID: <87bk58min5.fsf@hope.eyrie.org>
Message-ID: <l3pgohFgpbuU1@mid.uni-berlin.de>
Message-ID: <65324499$0$25970$426a74cc@news.free.fr>
Message-ID: <sDuJAC.1oJ2o@a3.nl.invalid>
Message-ID: <20231227we215908@o15.ybtra.de>

Amongst the above Message-IDs, only the first one was generated by INN. The others come from other implementations (clients or servers).

Well, anyway, thanks for the suggestion. It will be somehow taken into account when treating that ticket.

pmetzger commented 3 months ago

I think you have a point that including the actual "@human-readable.hostname" is a good thing. Probably the UUID-like algorithm should only be applied to the portion to the left.