TrueBlocks / trueblocks-core

The main repository for the TrueBlocks system
https://trueblocks.io
GNU General Public License v3.0
1.05k stars 194 forks source link

Adventures in scraping other chains #3655

Closed tjayrush closed 1 month ago

tjayrush commented 2 months ago

M]dreadedhamish: OK - more issues from unsupported chains!

Will an extended range of blocks with zero transactions cause chifra to interpret it as an error?

Longer story - After trying to do a roll-back on the chifra index for ethereum to match our fork (Pulsechain) I got lots of errors - tonnes of "ripe file not found for block... " and it get stuck repeating the same run of blocks, and after a few goes I thought I'd just sync from scratch, which using a remote node on the other side of the world took 14 days... until I got the same errors in the same spot. Which it turns out is around the time of the fork. On investigation I found there are extended ranges where there are zero transactions - 2 in particular are large - 2390 and 9968 blocks. So I'm thinking becuase chifra processes a run of 2000 transactions, and all of them are empty, then it interprets this as an error.

Could that be right? What would be the best way forward? run with "allowMissing = true" until I'm over the trouble spots?

I would appericate you help with a migration problem we are facing. We are working on creating an BNB mainnet index using TB. After ~2 weeks of syncing we've found out that the first ~5M blocks were created using an older version of TB. Thus we are getting the error Outdated file: /home/trueblocks/.local/share/trueblocks/unchained/bnbmainnet/blooms/000000000-000000000.bloom. File version: trueblocks-core@v0.40.0 Manifest version: trueblocks-core@v2.0.0-release Error: incorrect header hash

See https://github.com/TrueBlocks/trueblocks-core/blob/develop/src/other/migrations/README-v2.0.0.md.

As the instructions suggest we've tried to truncate / retag the old blocks but it immediately poped the error message again.

Any advice on how to rebuild the index for the old blocks? Thank you!

[11:14 AM]tjayrush | TrueBlocks.io: As DeadedHamish says, it should pick up exactly where it left off -- well -- almost. What it does is this: it processes 2,000 blocks at a time (you can change this with the --block_cnt option). If it finishes those 2,000, it will pick up where it left off. If it doesn't finish, it will return to the start of that 2,000 block range. In other words, unless it finishes all 2,000, it will start over at the previous break point.

Is that how you're finding it to work or are you seeing something different?

[11:14 AM]tjayrush | TrueBlocks.io: Thanks for the kind words. It is pretty powerful (if a bit finicky).

[11:15 AM]tjayrush | TrueBlocks.io: Will an extended range of blocks with zero transactions cause chifra to interpret it as an error?

Maybe. What error are you seeing?

[11:17 AM]tjayrush | TrueBlocks.io: ripe file not found for block...

Yes. This is the error I thought you were probably seeing...

[11:17 AM]tjayrush | TrueBlocks.io: So I'm thinking becuase chifra processes a run of 2000 transactions, and all of them are empty, then it interprets this as an error.

It does (cause it kind of is...) but there's a solution.

[11:18 AM]tjayrush | TrueBlocks.io: What would be the best way forward? run with "allowMissing = true" until I'm over the trouble spots?

Yes. Exactly this. But you can leave it on forever in case it happens again. On Ethereum mainnet, this would be quite a notable event, so it does this behaviour by default. [11:20 AM]tjayrush | TrueBlocks.io: Here's some more information: https://trueblocks.io/faq/#---im-getting-an-error-message-current-file-does-not-sequentially-follow-previous-file-what-do.

I think this works, but I'm not sure. I haven't looked at it in years. I'll double check now. Be right back. TrueBlocks FAQ FAQ

[11:23 AM]tjayrush | TrueBlocks.io: I just checked the code and I can say this. The above FAQ describes setting up a configuration file entry and this may still work, although it's no longer being tested. I think it still works though, since we would use that in docker installs.

What definitely works, though, is to start the scraper thus chifra scrape --allow_missing .

[11:25 AM]tjayrush | TrueBlocks.io: I'm going to have to appologize, but only slightly... version 0.40.0 is very old -- years if I'm not mistaken. We announces somewhere that when we went to version 2.0.0 we stopped supporting anything prior to version 1.0.0 (and if we didn't announce that, we should have).

The only solution is to start from scratch, I'm afraid. The migration code from those older caches is long gone.

[6:10 PM]dreadedhamish: This unfortunatelty isn't working - with --allow_missing it still retries the run.

[6:56 PM]dreadedhamish: what looks to have worked is setting block_cnt to 10000 (my largest gap is 9968, so each run is guaranteed to have some blocks)

[7:14 AM]tjayrush | TrueBlocks.io: Wow. That's interesting. It's such a hard thing to test. So you think every single one of the blocks inside the range had zero appearances of an address. allow_missing was written with the expectation that one or two blocks out of 2,000 would have zero appearances. I'll document this. There may be a simple thing we can do, but it won't get done right away. It will take time to research. Thanks so much for reporting.

tjayrush commented 1 month ago

We added a section to the FAQ related to the --allow_missing answer. Thanks for reporting.