farcasterxyz / hub-monorepo

Implementation of the Farcaster Hub specification and supporting libraries for building applications on Farcaster
https://www.thehubble.xyz
MIT License
697 stars 390 forks source link

Bug: Hub/Shuttle is streaming delete events but when queried hubs for latest state it says that its in merge state #2025

Open chetankashetti opened 3 months ago

chetankashetti commented 3 months ago

What is the bug? Basically we are receiving delete events from Shuttle but when we check for hub using the getAllVerificationMessagesByFid it says its in added state. we are assuming whenever MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS appears with merge state -> we add to our system delete state -> remove from our system prune state -> remove from our system revoke state -> remove from our system correct us if we're wrong. for ex. FID - 284564. If you observe we're adding a a timestamp when we receive called event_timestamp. delete event is coming without any remove event.

{
    "data": {
        "type": "MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS",
        "fid": 284564,
        "timestamp": 1717338302000,
        "network": "FARCASTER_NETWORK_MAINNET",
        "verificationAddAddressBody": {
            "address": "Kjn9lOE5lSw5WOaRcrqRjY9TNXE=",
            "claimSignature": "jTZWmE8MJlONRBzItIqo2ahEHucShtDpSVEKKME++y5xb+a26BDFfhdInlrPBSrxqjevwlc3zFub7bpWE00+zxw=",
            "blockHash": "lGlqnrWQvyrLS6y4EUhjd3GwoqTPbRGYFrOWuc1E3Og=",
            "verificationType": 0,
            "chainId": 0,
            "protocol": "PROTOCOL_ETHEREUM"
        }
    },
    "hash": "DyGKwXxmmpWnJnJXq9MFnV64Ty8=",
    "hashScheme": "HASH_SCHEME_BLAKE3",
    "signature": "sDemPPBPGKUYAZly17eHknYCMLCyzUUVEPJN8zYCQI95v8JiUbPQs8ycw3oU8WxLjDT0uv//j/QQhgv4zjqbDw==",
    "signatureScheme": "SIGNATURE_SCHEME_ED25519",
    "signer": "9YfDCLVjOCC1TSedks0ovIhQ0QxSp9jadZq7OEtSexk=",
    "eventTimestamp": 1717338302000,
    "operation": "merge"
}
{
    "data": {
        "type": "MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS",
        "fid": 284564,
        "timestamp": 1716794998000,
        "network": "FARCASTER_NETWORK_MAINNET",
        "verificationAddAddressBody": {
            "address": "Kjn9lOE5lSw5WOaRcrqRjY9TNXE=",
            "claimSignature": "tJnBlHSeoxYxsE29j4P/GvWXQpNOTlB5MbGdTpiHqEk3pfek71oIjhvrNa0WnAN2BRcTWZRyrvrzwE5hLY19Vhw=",
            "blockHash": "moftXqHGrPPyGWmBv523i7EA4ME7urpLZqyvJZ9INEU=",
            "verificationType": 0,
            "chainId": 0,
            "protocol": "PROTOCOL_ETHEREUM"
        }
    },
    "hash": "TsGP1MUa9rTjaNY42whm4ztP0KI=",
    "hashScheme": "HASH_SCHEME_BLAKE3",
    "signature": "AjgNy7sXqXN8lYKesgKzRM5soF0rNL1kXlIOoIA6Zhu3rEpkUSpyfmzxJfku0RRERvCHh4qKZ0T4Rrfjw6/zCQ==",
    "signatureScheme": "SIGNATURE_SCHEME_ED25519",
    "signer": "9YfDCLVjOCC1TSedks0ovIhQ0QxSp9jadZq7OEtSexk=",
    "eventTimestamp": 1717338306784,
    "operation": "delete"
}

There are thousands of such FIDs which are getting affected with this issue. please try to investigate this. as its our top priority. Feel free to reach me out if more details are needed in this issue. farcaster-dc: chetansk

We are live streaming all the events to kafka where we noticed this. you can stream using shuttle which can image

This is happening for lot of FIDs where delete events are coming but seems like there's no actual deletion happened.

sanjayprabhu commented 3 months ago

Hmm, so, what's seems to be happening here is that there are two verification add messages for the same address and fid.

one from May 27, with message hash "TsGP1MUa9rTjaNY42whm4ztP0KI" and one from June 2 with message hash "DyGKwXxmmpWnJnJXq9MFnV64Ty8="

When the June 2 message is merged, it correctly overwrites and deletes the May 2 one, because hubs have a unique constraint on the address. There is no remove message. A remove is message is not required to cause a delete, because a newer message will overwrite an older one (especially for verifications and UserDataAdd).

It looks like the events have made it into your kafka stream out of order. There's not enough information here to say what caused this. If you could provide the createdat and deletedat timestamps for these two messages from shuttle's messages table for these two messages it will help narrow down whether it's happening in the hub, shuttle or at some point after.

Note that the message timestamp is different. Even though the delete event timestamp is newer, the timestamp of the message is older than the timestamp of the message in the merge event. So, a workaround would be to check for this.

To summarize,

  1. delete events don't require a Remove message, and can happen when a newer message overwrites an older one
  2. the events are appearing out of order in the queue, but I'm not sure what's causing this, I'm pretty sure hub and shuttle emit in the right order, but we'd need data from the messages table to confirm
  3. checking for the message timestamp can be a potential workaround while the root cause is determined for 2) above.
chetankashetti commented 3 months ago

Thanks for the response @sanjayprabhu Having only delete message is totally fine. but there shouldn't multiple delete/merges? we have these records in our messages table

[
  {
    "id": "018f2e40-a33f-8cbc-27ca-81dab82afe3d",
    "created_at": "2024-04-30 09:05:39.124098+00",
    "updated_at": "2024-04-30 09:05:39.124098+00",
    "timestamp": "2024-04-16 23:07:19+00",
    "deleted_at": "2024-05-27 07:30:01.195+00",
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0x6d3460cbbc770e42913d6ae8942f47d96c9ee57e79ba62c9aaf6061c522329f906f37b1f6449730bca0567b90987d865efbc21423202304063fe3c77597248141c\",\"blockHash\":\"0x97c6ab49b01c3af5347959ae5456d8d95e7731a4f4330800b11dcd380cae1502\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  },
  {
    "id": "018fb8f4-c995-709e-8540-0be48b4d8ca3",
    "created_at": "2024-05-27 07:30:01.178892+00",
    "updated_at": "2024-05-27 07:30:01.178892+00",
    "timestamp": "2024-05-27 07:29:58+00",
    "deleted_at": "2024-06-02 14:25:06.781+00",
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0xb499c194749ea31631b04dbd8f83ff1af59742934e4e507931b19d4e9887a84937a5f7a4ef5a088e1beb35ad169c0376051713599472aefaf3c04e612d8d7d561c\",\"blockHash\":\"0x9a87ed5ea1c6acf3f2196981bf9db78bb100e0c13bbaba4b66acaf259f483445\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  },
  {
    "id": "018fd956-f8f5-4435-0ccc-53b232703349",
    "created_at": "2024-06-02 14:25:06.780531+00",
    "updated_at": "2024-06-02 14:25:06.780531+00",
    "timestamp": "2024-06-02 14:25:02+00",
    "deleted_at": null,
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0x8d3656984f0c26538d441cc8b48aa8d9a8441ee71286d0e949510a28c13efb2e716fe6b6e810c57e17489e5acf052af1aa37afc25737cc5b9bedba56134d3ecf1c\",\"blockHash\":\"0x94696a9eb590bf2acb4bacb81148637771b0a2a4cf6d119816b396b9cd44dce8\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  }
]
sanjayprabhu commented 3 months ago

Having only delete message is totally fine. but there shouldn't multiple delete/merges?

It's valid at the protocol level to keep sending messages. Newer messages will always overwrite older messages. You can keep changing your UserData.DISPLAY_NAME 50 times, and it will generate 50 merges and 49 deletes. Only the message with the latest timestamp is valid, and in the messages table there will be only one that's not marked deleted.

The above example seems to be for a different FID, but we can walk through that

018f2e40-a33f-8cbc-27ca-81dab82afe3d was created on April 30 018fb8f4-c995-709e-8540-0be48b4d8ca3 created on May 27, came in and deleted the message above (above messages was deleted_at a few ms after this one was created_at) 018fd956-f8f5-4435-0ccc-53b232703349 came in on June 2 and in turn deleted the above message (in this case, it looks like the delete event is .5ms after the create event)

I'm not sure why the delete event is after (even if it's by less than 1 ms), I checked the code and the callbacks should be called in order (delete first, then merge):

https://github.com/farcasterxyz/hub-monorepo/blob/main/packages/shuttle/src/shuttle/hubEventProcessor.ts#L52-L62

One potential reason is the they are not part of the same event, that's unlikely, but I'll double check to make sure.

You can add some logs in your MessageHandler to confirm, but the delete callback should be happening before the merge.

sanjayprabhu commented 3 months ago

Note, these are all different message hashes. The user is submitting multiple verification messages for the same eth address. The hub is doing the right thing by deleting the old message and merging the latest message and generating a event for each.

The events are for unique messages, they are not duplicates

chetankashetti commented 3 months ago

3. checking for the message timestamp can be a potential workaround while the root cause is determined for 2) above.

The problem with relying on the timestamp is, merge and delete events of same message hash will have the same timestamp. This can cause problem in our system because if they arrive in different order we won't know which is latest. so I think we can't rely on the timestamp. for different message hash of same address will have diff timestamp which i agree.

In the above example as you rightly said, For the FID: 284564 first message: 018f2e40-a33f-8cbc-27ca-81dab82afe3d
created_at -> 2024-04-30 09:05:39.124098+00 deleted_at -> 2024-05-27 07:30:01.195+00

second message: 018fb8f4-c995-709e-8540-0be48b4d8ca3 created_at -> 2024-05-27 07:29:58+00 deleted_at -> 2024-06-02 14:25:06.781+00

third message -> 018fd956-f8f5-4435-0ccc-53b232703349 created_at -> 2024-06-02 14:25:02+00

we are not modifying anything in our handler, it processes in the order we receive. As you pointed out delete_at of 018fb8f4-c995-709e-8540-0be48b4d8ca3 has higher timestamp than creation of the message 018fd956-f8f5-4435-0ccc-53b232703349. So the order events streamed from hub seems to be wrong. if we consume in this order we delete in our system instead of add. that is resulting in inconsistent data.

sds commented 2 months ago

Hey @chetankashetti, sorry for the delay.

EDIT See below comment instead.

sds commented 2 months ago

One other point worth mentioning: if you look at how Shuttle is implemented (make sure you're running the latest version, as some code may have changed), you can see that Shuttle always processes the deleted messages before the merged message:

https://github.com/farcasterxyz/hub-monorepo/blob/fcb76cd9adcc2d9eff9f1a1832fc89c8bacff117/packages/shuttle/src/shuttle/hubEventProcessor.ts#L52-L63

Given this, you can safely always treat handleMessageMerge with a state parameter of deleted as safe to delete, and one with state of created as safe to create. This is much simpler than what I was previously discussing above, so I've updated my prior comment to avoid further confusion.

Just know that if you create your own handlers (I know you're not doing this—I'm just mentioning for completeness) and you don't use the state / operation parameters, you will have to reinvent the CRDT logic locally.