IO count report - Githubissues

simerplaha commented 2 years ago

Working on this issue I kinda needed a proper IO count check because from the code it seemed high.

Looks like we are executing around 1000 queries a second, which is high.

The graph below shows the number of queries executed by DBRunner per second, for the first 10 minutes (while also running a node locally).

If you'd like to inspect this graph, I've uploaded it here.

Green shows IO count when running BlockFlowSyncService only.
Red show IO count for running all services.

Let me know if you'd like to run these IO count yourself, I will push some code so it's executable.

Is this a valid issue?

Concurrent overwrites

One thing of interest is here, data fetched from multiple nodes is concurrently overwriting data created by one another other. I'm sure this will eventually lead to the correct database state but is this concurrent mutation by design/required?

simerplaha commented 2 years ago

Maybe someone else should run this IO count locally? Incase I'm making some mistake, 1000 queries is a bit too high to be true.

polarker commented 2 years ago

1000 queries per second, that's indeed way too high. Is it the number after syncing?

tdroxler commented 2 years ago

One thing of interest is here, data fetched from multiple nodes is concurrently overwriting data created by one another other

Actually we are not fetching same data. In our sharded blockchain, we can run a clique of multiple nodes and each node is responsible for a subset of chain indexes. In order to get all data we need to query every node. the nodeUris is given on start when fetching the clique info.

Maybe someone else should run this IO count locally?

yes I'll do it

simerplaha commented 2 years ago

Is it the number after syncing?

It's during sync.

Node is here

2022-09-14 17:27:38,757 [dispatcher-13] INFO o.a.f.h.BlockChainHandler - hash: 00000000001ad72c6e0119d8072e609edbea958816fb480b1296a832a44df68b; chain: 2->3; height: 121577/121577; total: 1939669; target hashrate: 22970052 MH/s; blockTime: 23385ms; #tx: 1

Explorer was a fresh start - clean DB. Graph data is for first 10 minutes after the start.

simerplaha commented 2 years ago

In our sharded blockchain, we can run a clique of multiple nodes and each node is responsible for a subset of chain indexes.

This is always super impressive to hear. Very cool!

polarker commented 2 years ago

1000 QPS during syncing makes great sense for me. We sync hundreds of blocks per second.

tdroxler commented 2 years ago

I don't have a nice graph, but while syncing from scratch I have the same result as Simer. Once it's fully synced we get down to around 5-10 queries per second

polarker commented 2 years ago

I don't have a nice graph, but while syncing from scratch I have the same result as Simer. Once it's fully synced we get down to around 5-10 queries per second

This looks pretty decent&reasonable to me at current stage!

simerplaha commented 2 years ago

I’m not sure how this looks reasonable.

Around 10 would still be high but acceptable for any similar application that is not real-time for writes and is primarily doing writes as background batch process that has the luxury of being efficient.

We are doing 100x what’s reasonable, which is way on the extreme end.

But ok. How do we progress? Close it?

polarker commented 2 years ago

Since it's just me that is clear, we for sure cannot close it. I might be wrong.

Let's say we sync 200 blocks per second and each block needs 5 IO queries. Then in total the number of query per second is 1000. Am I missing anything?

Or do you mean the query is for the number of HTTP requests?

simerplaha commented 2 years ago

200 blocks per second where each blocks needs 5 IO queries should not need 1000 IO queries, it only needs 5 queries in total, 1 query per 200 blocks.

So only 5 queries per second, not 1000.

Normally it shouldn’t even need 5 queries. For most cases, anything one wants to achieve can be done in a single query. Unless for some special case where it might need more than one, maybe two.

Or do you mean the query is for the number of HTTP requests?

I mean database queries.

tdroxler commented 2 years ago

the main bottleneck is the update of the main chain, to be consistent we need for each block to check if the parent is in the db and if it's part of the main chain, if not we need to update the parent and to the same for its parent. For sure there are room for improvement, but just wanted to raise that it's not just inserting the data.

polarker commented 2 years ago

As @tdroxler mentioned, those 200 blocks are handled sequentially for now and therefore inserted one by one.

We could optimize it and batch the writes of the 200 blocks, but

I believe the current architecture is not ready for such optimizations.
I am not sure how much benefit will it be to batch 200 blocks. I think batching will not help once the batch size is too big. You reduce the number of queries with batching, but each query becomes much heavier. We will need to benchmark to decide how many blocks to batch. Also, keep in mind that block size varies once we attract more users.

So, I'd treat this as premature optimization for now.

1000 Query per second should not be a bottleneck for Postgresql, and the IO cost is not an issue either since we are using self-host Postgresql now.

simerplaha commented 2 years ago

Sure fellas. Your call.

polarker commented 2 years ago

One thing we could do right now is to get the number of queries per block, and see if we could reduce it.

tdroxler commented 2 years ago

The costly part is probably this one where we do a lot of back and forth to the DB to get the parent and make sure the main chain is correct.

As we discussed @simerplaha one of the first step we could do is to bring BlockFlowSyncService closer to the DB, rather than going through the BlockDAO, but it won't solve the number of IO issues. Not sure yet how to avoid all those calls when checking the main chain, but maybe a recursive query? I never tried to used one tho.

polarker commented 2 years ago

@tdroxler This updateMainChain issue can be improved greatly in this approach:

Upon receiving new blocks (200 blocks for example), we calculate the longest 16 chain (i.e. sequences of blocks from the 200 blocks) for these new blocks.
For each longest chain [block0, block1, ..., blockN], we set main_chain = true and only call updateMainChain once for block0.

In this way, we will reduce the number of updateMainChain from 200 to 16.

tdroxler commented 2 years ago

@polarker We use to do this, but there was some small edge cases where it could be problematic and we decided to go the heavy way to be sure everything was right. I can't fully remember them :/

Edit: Maybe one issue was that if we had another main chain and at some point we change to another fork, we need to make sure to update all blocks from previous main chain to main_chain = false. Currently for each block we insert, we check if there are other blocks at that height and put them to false.

polarker commented 2 years ago

As far as I remember, in the past we did not calculate the longest chain for new blocks and there were edge reorg cases. We chose to update main chain for each new block as it's the simplest approach back then.

simerplaha commented 2 years ago

Yep that recursive query is going to be fun. Can look that after working on your previous suggestion here and if there are any others that are smaller and can be done quicker.

one of the first step we could do is to bring BlockFlowSyncService closer to the DB

Yep 👍🏼

Will use this graph as baseline and add more metrics over the next few PRs with the goal reduce the QPS. Maybe also add the total number of blocks processed after n seconds might help see by how much sync's overall performance is effected.

polarker commented 2 years ago

Good point, we need to improve our metrics system too.

alephium / explorer-backend

IO count report #348

Concurrent overwrites