`get_messages` fails if any one of the messages does not fit the specification

aleph-im / aleph-client

Lightweight Python Client library for the Aleph.im network

MIT License

12 stars 14 forks source link

There are some invalid messages in the Aleph Nodes, in particular PROGRAM messages found on https://api2.aleph.im/api/v0/messages.json?msgType=PROGRAM .

They fail being parsed as AlephMessage with either KeyError or pydantic.ValidationError.

With the new aleph-client only manipulating objects, this means that one invalid message fails the entire get_messages request.

async def get_messages(
    ...
) -> MessagesResponse:

class MessagesResponse(BaseModel):
    """Response from an Aleph node API."""

    messages: List[AlephMessage]

There are multiple ways to handle this:

Ignore silently messages that do not validate
Ignore with logging a warning messages that do not validate
PyAleph APIs validate messages before returning them (or even drop them from DB)
Return Messages mixed with Error objects

What do we want to go for ?

Mmh conceptually I dislike the idea of the CCNs delivering data in the wrong format. But we have to consider two changes to the message spec:

The message is incorrect, was integrated because of a bug in the CCNs and should be marked as invalid and discarded after an update fixes said bug
The message is correct, an incorrect update to the message spec temporarily marks it as invalid. For example, this did happen when I introduced Pydantic models for validation on the CCN, my models were a bit too strict at the start.

Checking messages at runtime potentially has a huge cost, there are processes out there retrieving messages more than 10k at a time. Running Pydantic model validation on all these will definitely cause performance issues.

IMO the best solution would be to keep the DB clean: whenever we release a new version that touches message validation, a configurable mechanism should trigger a re-validation of all the messages stored on the node. This way, we could:

test the message validation on test nodes without affecting old messages
clean-up old messages once changes have been validated
have this without a runtime penalty on CCNs.

On the client side, we should definitely validate all the messages and print a warning if we detect invalid messages. So, basically, I'm all for option 2 (or 4, it's the same thing just a bit more dev-friendly) on the client-side, and a mechanism to re-validate all the messages on specific updates for CCNs.

aleph-im / aleph-client

`get_messages` fails if any one of the messages does not fit the specification #108