Summary: scraping is not much slower than it used to be even though we query more data.
New scraper is a little bit slower than the old scraper, but not as much as I feared, especially when we consider that we're doing more querying than we were before (for withdrawals, for example).
Reth may be faster than Erigon, but the machine I was on was quite a bit faster, so maybe not such an accurate statement.
Reth does not support trace_filter. On Erigon, using trace_fitler against a 50 block range appears to be about five times faster than 50 individual queries for single trace_blocks (which makes sense). And this effect is more pronounced when going over the wire.
Using trace_filter with a block range (the data gets very big, so experiment with the optimal range size) seems to be a more productive use of our time than use eth_getLogs with a range. If we had to choose one, I'd choose trace_filter. But, trace_filter and eth_getLogs may take the same range in which case it may be a two-for-one deal.
About 37% of the time during scraping is non-rpc-query related. 54% is querying traces. 6% querying withdrawals and about 1% querying logs which argues, again, for optimizing traces.
I removed the block query for withdrawals from pre-Shanghai blocks, so that will help. The scraper is "not so bad" even given that it's querying more (withdrawals and receipts).
This issue is delayed. We will get to it one day.
Summary: scraping is not much slower than it used to be even though we query more data.
New scraper is a little bit slower than the old scraper, but not as much as I feared, especially when we consider that we're doing more querying than we were before (for withdrawals, for example).
Reth may be faster than Erigon, but the machine I was on was quite a bit faster, so maybe not such an accurate statement.
Reth does not support
trace_filter
. On Erigon, usingtrace_fitler
against a 50 block range appears to be about five times faster than 50 individual queries for singletrace_blocks
(which makes sense). And this effect is more pronounced when going over the wire.Using
trace_filter
with a block range (the data gets very big, so experiment with the optimal range size) seems to be a more productive use of our time than useeth_getLogs
with a range. If we had to choose one, I'd choosetrace_filter
. But,trace_filter
andeth_getLogs
may take the same range in which case it may be a two-for-one deal.About 37% of the time during scraping is non-rpc-query related. 54% is querying traces. 6% querying withdrawals and about 1% querying logs which argues, again, for optimizing traces.