RFC: Distinguish the different meanings of a "message"

The context for this suggested change is https://github.com/ably/ably-js/issues/1398. There, I pointed out that the specification’s current signatures for publish (specifically, the overloads that accept a Message or an array thereof) do not seem to match how we’re expecting these methods to be used in real life. This is because Message’s id and timestamp properties are specified as non-optional, meaning that a user calling publish would need to populate these properties. In reality, we do not expect a user to populate these properties — they are usually generated by the Ably service.

The easiest way to solve this would be to be to make these properties optional. However, doing so would unnecessarily remove some useful type information for recipients of messages — the type system would no longer communicate that these properties are guaranteed to be populated in a message emitted by the library.

In this PR, I’m proposing that we distinguish between three separate concepts of a "message", which I think are perhaps currently being incorrectly conflated:

The data type that a user of the library passes to the publish methods
The data type that the library emits from methods that expose messages published on a channel
The data type that describes a serialized message that is transmitted over the wire

I’ve named the first one OutgoingMessage, the second one IncomingMessage, and I believe that the third belongs in the documentation for the Ably protocol, not the library specification.

OutgoingMessage and IncomingMessage differ from the existing Message type in the following ways:

OutgoingMessage’s id and timestamp properties are now optional
OutgoingMessage does not have fromEncoded* methods
IncomingMessage’s connnectionId property is now non-optional (i.e. we are now able to provide stronger type information for this property) — I need to double-check whether this property is actually guaranteed to be populated by the library; my reading of TM2c and RTL6f suggested that it is, but I’m not sure if TM2c’s "the @connectionId@ of the encapsulating @ProtocolMessage@" is guaranteed to be populated.
IncomingMessage does not have constructors

I have not yet introduced spec points for these two new types — I will do so if there is a consensus to move forwards with this approach. For now, see the changes to the IDL.

Other thoughts:

I think that, similarly to the Message wire type, the ProtocolMessage type should also only be described by the protocol documentation, and not the feature spec.
If we do choose to start leaning more heavily on the protocol documentation, then we’ll need to bring it up to date — it looks like it hasn’t been touched in quite some time and still mentions connectionSerial, for example.
I’ve kept the exact same list of properties in IncomingMessage and OutgoingMessage, since my reading of RSL1j is that a user publishing a message should be able to populate any of the properties of the message that eventually gets sent over the wire. But if that’s not the case, then we may be able to remove some properties from OutgoingMessage.

@lmars @SimonWoolf — added you as reviewers because I thought you might have opinions, but feel free to ignore.

my thoughts:

Tightning up the user-visible typings for messages on publish vs subscribe seems like an improvement, sure. But I'm not convinced making them two entirely different classes (in the spec or user-visibly) is the way forward, that just seems more confusing for the user.
- Besides, many -- actually most -- sdks are in languages that don't represent nullability in their type systems, and having the spec require that those languages implement two different classes for incoming and outgoing that'd be ~identical would be unnecessary
- How about we just tweak the spec IDL to say id: String // TM2a, optional for Message objects provided by the user, which sdks can interpret however they want? and then eg in ably-js expose a type that's still a Message, just tweaked to give you non-optional fields, ie type IncomingMessage = Message & Required<Pick<Message, 'id' | 'timestamp'>>
connnectionId is not guaranteed to be populated (either in the ProtocolMessage or the Message),eg because the message could be from a REST publish which has no connectionId because it isn't a connection
I'm kindof tempted to say that the lesson we could lean from the fact that the protocol page has barely been touched or read for years was that.. the attempt at having that split between publically-visible behaviour and the wire protocol was maybe a bad idea? Like -- a clean split is near-impossible, describing publically-visible behaviour necessarily needs to specify lots of internal behaviour too -- so you end up with the features spec that over time specifiec more and more internal bits, so fewer people ever look at /protocol, so it doesn't get updated and more things just go in the features spec, self-reinforcing cycle. Maybe we should just give up and pull all relevant specifications into the features spec (and rename it just 'the sdk spec'), so at least it'll be visible, read and updated as part of spec changes?

I'm not convinced making them two entirely different classes (in the spec or user-visibly) is the way forward

Is your concern literally about the fact that there would be two classes, or the fact that there would be two classes whose definitions are almost identical? If the latter, how would you feel about, for example, having incoming and outgoing message classes which subclass some common class?

Besides, many -- actually most -- sdks are in languages that don't represent nullability in their type systems, and having the spec require that those languages implement two different classes for incoming and outgoing that'd be ~identical would be unnecessary

These languages could choose not to implement two different classes, and just have one?

How about we just tweak the spec IDL to say id: String // TM2a, optional for Message objects provided by the user, which sdks can interpret however they want? and then eg in ably-js expose a type that's still a Message, just tweaked to give you non-optional fields, ie type IncomingMessage = Message & Required<Pick<Message, 'id' | 'timestamp'>>

Given that the IDL does represent nullability, would this not need to be id: String? // TM2a, guaranteed to be populated for messages received from Ably?

connnectionId is not guaranteed to be populated (either in the ProtocolMessage or the Message),eg because the message could be from a REST publish which has no connectionId because it isn't a connection

Thanks, I've updated my changes.

a clean split is near-impossible, describing publically-visible behaviour necessarily needs to specify lots of internal behaviour too

I'm not sure what you mean here. Is it that in order to specify how the client library should be implemented (i.e. what's currently described by the feature spec) one must necessarily describe how the Realtime service behaves and some ways in which a client must behave in order to correctly interact with it? That's true, for sure. But what about the other way round — do you believe that it's possible to describe the protocol in a manner that is more concise and understandable than describing exactly how the client library must be implemented? And if so, do you believe that there is value in that — perhaps, for example, from a didactic point of view, to help people understand the motivation behind the implementation requirements described by the feature spec? The feature spec definitely has a "handed down from the gods" vibe at the moment, and I wonder whether empowering developers to understand why it is the way it is might be a good thing.

We're addressing this in ably-js in line with your suggestion, @SimonWoolf: https://github.com/ably/ably-js/pull/1515.

ably / specification

RFC: Distinguish the different meanings of a "message" #156