Running multiple remote archive nodes may cause duplicate epoch_data rows and prevent a block from loading

jrwashburn commented 6 months ago

Preliminary Checks

[X] This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/MinaProtocol/mina/issues
[X] This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/MinaProtocol/mina/discussions

Description

When running multiple archive nodes to a remote database, I have had two instances of this error. The first I ignored because I thought maybe I didn't set transaction isolation level, so I verified that isolation was set to serializable, and the issue occured again. The block was not added in (near) real time and then the script that loads precomputed blocks failed with:

{"timestamp":"2024-02-10 06:50:58.231794Z","level":"Warn","source":{"module":"Archive_lib__Processor","location":"File \"src/app/archive/archive_lib/processor.ml\", line 2052, characters 12-23"},"message":"Error when adding block data to the database, rolling it back: $error","metadata":{"error":"Unexpected result from : Received 2 tuples, expected at most one. Query: \"SELECT id FROM epoch_data WHERE seed = $1 AND ledger_hash_id = $2\".","pid":17325}}

In both cases, there were multiple rows returned for the same seed and ledger_hash_id using this query select min(id), max(id), seed, ledger_hash_id, count() from epoch_data group by seed, ledger_hash_id having count() >1;

In both cases, I manually deleted the epoch_data and related blocks (note that the database relations did not require cleaning up any other data except delete blocks and epoch data - so a simple statement like this can clean things up enough to load blocks. I didn't bother to clean up user transactions, etc.

delete from blocks where id in(338498,338499); delete from epoch_data where id in (219968,219969);

I have tried to run multiple archive nodes pointing to a single database to try to not miss blocks (it still happens occasionally), but have stopped that since this error. The issue has not occurred again in the past 2 weeks since reducing to 1:1 archive : db.

Unfortunately I deleted the block and epoch_data problem records so a dump may not be useful, but I have one preserved (after the manual deletes) if it is wanted.

Reproduce: node:~/archive-utils$ mina-archive-blocks --precomputed --archive-uri postgresql://CONNECTIONSTRING mainnet-328803-3NLGbNGHLEut37HzTwMtTfTwr7wFGBRTysY21YbzcibwkZ4KzmKk.json

{"timestamp":"2024-02-10 06:50:58.102124Z","level":"Info","source":{"module":"DuneexeArchive_blocks","location":"File \"src/app/archive_blocks/archive_blocks.ml\", line 30, characters 6-17"},"message":"Successfully created Caqti connection to Postgresql","metadata":{"pid":17325}} {"timestamp":"2024-02-10 06:50:58.231794Z","level":"Warn","source":{"module":"Archive_libProcessor","location":"File \"src/app/archive/archive_lib/processor.ml\", line 2052, characters 12-23"},"message":"Error when adding block data to the database, rolling it back: $error","metadata":{"error":"Unexpected result from : Received 2 tuples, expected at most one. Query: \"SELECT id FROM epoch_data WHERE seed = $1 AND ledger_hash_id = $2\".","pid":17325}} {"timestamp":"2024-02-10 06:50:58.232884Z","level":"Warn","source":{"module":"Archive_lib__Processor","location":"File \"src/app/archive/archive_lib/processor.ml\", line 2011, characters 10-21"},"message":"Error in add_block_aux : $error. Retrying...","metadata":{"error":"Unexpected result from : Received 2 tuples, expected at most one. Query: \"SELECT id FROM epoch_data WHERE seed = $1 AND ledger_hash_id = $2\".","pid":17325}} {"timestamp":"2024-02-10 06:51:00.173218Z","level":"Warn","source":{"module":"Archive_libProcessor","location":"File \"src/app/archive/archive_lib/processor.ml\", line 2052, characters 12-23"},"message":"Error when adding block data to the database, rolling it back: $error","metadata":{"error":"Unexpected result from : Received 2 tuples, expected at most one. Query: \"SELECT id FROM epoch_data WHERE seed = $1 AND ledger_hash_id = $2\".","pid":17325}}

Steps to Reproduce

Unsure. Assumed race condition between partially inserted blocks across archive nodes.

Expected Result

A block will either load completely, successfully, or will roll back completely so that it can be reloaded by the mina-archive-blocks utility.

Actual Result

Sometimes, duplicate epoch_data can be created.

Daemon version

Commit c980ba87c3417f40a7081225dfe7478c5ee70fd7 on branch master

How frequently do you see this issue?

Sometimes

What is the impact of this issue on your ability to run a node?

High

Status

Archive node, status is not relevant.

Additional information

Log messages from the mina-archive daemon with the "epoch_data" in them are available here: https://drive.google.com/drive/folders/1D_c2QLn4jIG5OVQhaIsmpctRovQ48rGI?usp=sharing

Most errors are duplicate key violations because of the multiple nodes (you will note these stop after 2/10 when I scaled down to a single archive daemon,) but you can see multiple instances of the "Received 2 tuples, expected at most one." error.

psteckler commented 6 months ago

In the berkeley and develop branches, the epoch_data table has a UNIQUE constraint.

In compatible, there's no such constraint, so duplicates are possible.

jrwashburn commented 6 months ago

That is a simple enough patch. Should adding a unique constraint, as below, be safe for the interim? Or is that just likely to move the issue to another table? My motivation is that it would be much more likely to keep the archive database in synch with >1 archive node feeding it, and it would have shorter periods with holes in the chain.

ALTER TABLE epoch_data ADD CONSTRAINT epoch_data_seed_ledger_hash_id_unique UNIQUE (seed, ledger_hash_id);

//guessing at _unique naming convention, based on _key suffix observed here: https://github.com/MinaProtocol/mina/blob/develop/src/app/archive/add_unique_constraints.sql

Is there any reason I should not do that to my archive?

psteckler commented 6 months ago

If there are existing duplicates, the constraint won't be satisfied, and Postgresql will complain.

jrwashburn commented 6 months ago

I've already deleted those in order to get the blocks to load and get the database to continue. Any other reason you would advise against?

psteckler commented 6 months ago

Do the references to those deleted entries now refer to the existing entries?

psteckler commented 6 months ago

I'm not sure the uniqueness constraint fully solves the problem, there may be an issue with concurrent transactions. But I'm not the one to ask (I'm no longer at O(1)).

jrwashburn commented 6 months ago

Do the references to those deleted entries now refer to the existing entries?

My solution was to just delete all blocks that referenced the duplicated epoch_data rows and then let mina-archive-blocks populate them again. It seems to be okay, although I'm going to try to run replayer soon...

Thanks for responding regardless of connection to o1!

EmrePiconbello commented 6 months ago

Is this a single PostgreSQL instance without any shards or slaves? If so, are you querying data often from this archive node?

jrwashburn commented 6 months ago

Is this a single PostgreSQL instance without any shards or slaves? If so, are you querying data often from this archive node?

It's a single instance - you cannot run RO postgres nodes with DEFAULT_TRANSACTION_ISOLATION set to SERIALIZABLE.

There are not many queries. I am hitting each endpoint every couple of mins for uptime monitoring, which each run a query, but it is very low load.

EmrePiconbello commented 6 months ago

I just want to check in case there are some changes. For me, it doesn't make sense that serialized option is blocking read-only ones, but I was never into sql, so maybe I am missing something.We did run a single-server postgres, multi server postgres, 3-archive 3-node, 3-6 node 1-archive, and all possible combinations, but none of them worked well enough.

jrwashburn commented 6 months ago

I just want to check in case there are some changes. For me, it doesn't make sense that serialized option is blocking read-only ones, but I was never into sql, so maybe I am missing something.We did run a single-server postgres, multi server postgres, 3-archive 3-node, 3-6 node 1-archive, and all possible combinations, but none of them worked well enough.

https://www.postgresql.org/docs/current/hot-standby.html See caveats: "The Serializable transaction isolation level is not yet available in hot standby. (See Section 13.2.3 and Section 13.4.1 for details.) An attempt to set a transaction to the serializable isolation level in hot standby mode will generate an error."

Using isolation level serializable allows the code to not handle as much simultaneous access across clients, but it is expensive, and for pg, means you cannot have a hot standby. There are lots of ways to handle it better in the code, but that work would need to be done.

MinaProtocol / mina