allada / bsc-archive-snapshot

Free public Binance Smart Chain (BSC) Archive Snapshot
Apache License 2.0
112 stars 25 forks source link

Failed to derive block receipts fields #14

Closed fomoli535 closed 2 years ago

fomoli535 commented 2 years ago

Hi there, Many thanks for the incredibly valuable work here!

When running getTransactionReceipt on transactions in blocks around 14794507 I am getting the following error in the erigon output: [EROR] [02-24|15:36:10.630] Failed to derive block receipts fields hash=0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df number=14794507 err="transaction and receipt count mismatch, tx count = 276, receipts count = 275"

I am not getting this error for most recent blocks, and not getting it either for blocks prior to ~14500000

Any thoughts?

allada commented 2 years ago

Hmmm, can you try to query a contract at that block?

Erigon is still in alpha/beta stage, so I would not be surprised if there are some speed bumps.

allada commented 2 years ago

It also might have to do with this: https://github.com/ledgerwatch/erigon/issues/3438

I have not updated my version of erigon in a while, so I'll try to find some time to getting this updated.

fomoli535 commented 2 years ago

It also might have to do with this: ledgerwatch/erigon#3438

I have not updated my version of erigon in a while, so I'll try to find some time to getting this updated.

Yep this definitely looks like it. Is there a way to resync all blocks since 14500000? A full resync would be a pain in the a** :)

fomoli535 commented 2 years ago

Hmmm, can you try to query a contract at that block?

Erigon is still in alpha/beta stage, so I would not be surprised if there are some speed bumps.

I’ll try that later today. Away from my terminal right now. Would you be able to call getTransactionReceipt for transaction 0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725 on your side to validate if the issue is isolated to me?

Also I am running the latest devel version of erigon on my side. I created the node using your snapshot yesterday. My latest blocks are not showing the error, so my hypothesis is you could be running a version of erigon that’s not patched for the issue and is compromising blocks since ~1 month ago

allada commented 2 years ago

It does appear to work for me. I am using the following script:

const ethers = require('ethers');
const provider = new ethers.providers.JsonRpcProvider('http://127.0.0.1:8545');
provider.getTransactionReceipt('0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725').then(data => {
  console.log(data);
});

And returns:

{
  to: '0x10ED43C718714eb63d5aA57B78B54704E256024E',
  from: '0x58acEc799F8bb38A47f6719B4bd2DBd1bA9B3644',
  contractAddress: null,
  transactionIndex: 273,
  gasUsed: BigNumber { _hex: '0x01f5ce', _isBigNumber: true },
  logsBloom: '0x00200a00000000000000000080000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000002004000008000000200000000000400000000400010000000000000000000000900000000000000000000000000000240000000010000000000000000000000000000000000000000000240000000000080000004000200000020000000000000000000000020000000000000000000000000000000000000000000002000000000000000000004000100000000000001000000002008080000010000000000000000000000000000000000000000000000000000000000000',
  blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df',
  transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
  logs: [
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0x00e1656e45f18ec6747F5a8496Fd39B50b38396D',
      topics: [Array],
      data: '0x00000000000000000000000000000000000000000000000f2dc7d47f15600000',
      logIndex: 867,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    },
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0x00e1656e45f18ec6747F5a8496Fd39B50b38396D',
      topics: [Array],
      data: '0xffffffffffffffffffffffffffffffffffffffffffffffee6941766ef4aaab3e',
      logIndex: 868,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    },
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0xbb4CdB9CBd36B01bD1cBaEBF2De08d9173bc095c',
      topics: [Array],
      data: '0x00000000000000000000000000000000000000000000000011fe039f65046575',
      logIndex: 869,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    },
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0x2Eebe0C34da9ba65521E98CBaA7D97496d05f489',
      topics: [Array],
      data: '0x000000000000000000000000000000000000000000007a6b9f2ef2efcb85283a000000000000000000000000000000000000000000000091678546675f43c698',
      logIndex: 870,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    },
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0x2Eebe0C34da9ba65521E98CBaA7D97496d05f489',
      topics: [Array],
      data: '0x00000000000000000000000000000000000000000000000f2dc7d47f156000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011fe039f65046575',
      logIndex: 871,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    },
    {
      transactionIndex: 273,
      blockNumber: 14794507,
      transactionHash: '0xc46fdb4d59a2c77fe347a021fdfda51e230fd4f770da3ca093d9dc758c32f725',
      address: '0xbb4CdB9CBd36B01bD1cBaEBF2De08d9173bc095c',
      topics: [Array],
      data: '0x00000000000000000000000000000000000000000000000011fe039f65046575',
      logIndex: 872,
      blockHash: '0x7cca3e99c68f0cda13ec1a1a48769cbf420c690f457d23677b0a1d8cb2bd83df'
    }
  ],
  blockNumber: 14794507,
  confirmations: 803886,
  cumulativeGasUsed: BigNumber { _hex: '0x028fcc88', _isBigNumber: true },
  effectiveGasPrice: BigNumber { _hex: '0x012a05f200', _isBigNumber: true },
  status: 1,
  type: 0,
  byzantium: true
}

I am currently using this version of erigon: ff4fa5efd35f4bf609d34293b904c1af8d8b5ad8. You can try building that specific version to see if it fixes it.

It is also unlikely, but possible that your download was somehow corrupted, a fresh download might help.

fomoli535 commented 2 years ago

Hey, thanks so much for giving this a try and replying again here :) When running this query, did you notice anything in your rpcdaemon stdout/stderr? The error message I mentioned in my first post popped up there, not in the api call response.

allada commented 2 years ago

Yes I do see this issue. I will attempt to resync the chain, but this will take some time. This does appear to be fixed on the lastest checkout though.

For what it's worth, I doubt most people will need the transactions it skips. It appears that these are system transactions, which 99% of people probably don't need unless you are wanting to keep super precise logs on everything (ie: this should not affect most wallet queries).

fomoli535 commented 2 years ago

Ok, thanks for confirming that the issue is not just on my side. I'm attempting an unwind to a block prior to when this issue started to take place using --bad.block=0xc4cd50a70375e98b264eb32c464bde4e31af7d053f7e7cb98dfc1bbbc6af7377 to try and avoid a full resync. It's been going for about a day now. Will keep you posted on that.

And yes, to your point the impact is probably nonexistent for most people, but I want to use the data for analytics purposes so need the cleanest data possible :-)

allada commented 2 years ago

Yes, I am already in the process of doing this. It does look like --bad.block will do what we want (from my reading of the code), but might take some hacking of the code in order to get it to ensure it doesn't persist the bad block.

I bisected the bad block to: 14515943. I'm going to resync form block: 14510000 just to be safe.

allada commented 2 years ago

It looks like it is working, but won't know for a week or so. What I did was:

  1. Update to latest code in devel branch (0cac29d1d2021562722dd6dec5d36f170d5a0a84)
  2. [not-needed by must users] Stop cron to upload once per day
  3. Set --bad.block=0x081e77856fe4b03f1856bb9bf41c7341f9148395d91d414d76ed4ac3c09abf54 in the docker-compose.yml file
  4. Start docker-compose, once it starts processing then...
  5. Stopped docker-compose
  6. Removed the flag from step 3
  7. Run start docker-compose again.

I'll post an update in about 2 days.

fomoli535 commented 2 years ago

Ha, great that you took the time to narrow down the bad block #. I didn't do that and am unwinding to 14500000. Oh well.

fomoli535 commented 2 years ago

Unwinding has failed 3 times already on my side. Erigon crashes in stage 9 with no error message.

[INFO] [02-27|14:13:57.714] [9/16 IntermediateHashes] ETL [1/2] Extracting from=StorageChangeSet current key=0000000000ed4d75bb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c0000000000000001 alloc="3.2 GiB" sys="6.3 GiB"
[INFO] [02-27|14:14:11.065] Flushed buffer file                      name=/erigon/data/erigon/etl-temp/erigon-sortable-buf-1347755927 alloc="3.8 GiB" sys="6.3 GiB"
[INFO] [02-27|14:14:15.879] [p2p] GoodPeers                          eth66=27
[INFO] [02-27|14:16:15.879] [p2p] GoodPeers                          eth66=28
[INFO] [02-27|14:18:15.880] [p2p] GoodPeers                          eth66=27
[INFO] [02-27|14:20:15.879] [p2p] GoodPeers                          eth66=28

Seems like it's not going to work on my end.

Question for you: did you keep any snapshot from block ~14510000?

allada commented 2 years ago

Mine is still going, I'm on a very under-powered AWS instance (I may upgrade it though).

I'm on stage 12, but it is still going.

Are you sure it crashed? I read through the code and most steps don't publish anything in the logs while doing an unwind.

I also added 100 gigs of swap last night, because when reading the code I realized it loads most of the stuff in memory for what it wants to delete. This morning when I looked at it, it went well beyond 50 gigs of memory and I only have 30 gigs of memory on this instance.

fomoli535 commented 2 years ago

Yep it definitely crashed on my side. ps aux | grep erigon does not show any signs of life. That happened roughly 24h after start of unwinding. And no error in the logs. I'm running on a i3en.2xlarge instance (64GB of memory).

I'm curious to hear if it works out on your side. As a side question, do you by any chance keep older snapshots in storage?

allada commented 2 years ago

No, sadly I only kept the last 3 days. In hindsight I should have kept version in archive storage once-per-month.

Yeah, you probably ran out of memory. I'm spinning up a new instance right now with is4gen.8xlarge. This should be able to unwind much faster (as it has 4x NVMes and it looks like it's mostly disk IO bound). By default I use im4gn.2xlarge as it is just enough to fit everything on disk when compressed and barely able to keep up with the chain.

fomoli535 commented 2 years ago

Gotcha, hope it works out. In parallel I am going to spin up a new machine and start syncing from scratch, in case that's our only way out.

allada commented 2 years ago

Update: I was able to get erigon to roll back all the way to block 14509999. I am now syncing upwards. I did discover a bug in erigon though that it would not download new blocks after rolling back so far. I think I fixed it by adding some custom hacks in to force it to forget about some residual state.

I am hoping that there won't be any more of these unwind state bugs.

fomoli535 commented 2 years ago

Wow. Brilliant. Hope it works out!

allada commented 2 years ago

It is now at the Execution stage (Stage 6). The bad news is it appears it could take up to a month to catch back up. It's able to process about 4-6 blocks per second and there's quite a few more steps after this stage is done.

Stay tuned. I'll be uploading partial updates once per day in the following folder: s3://public-blockchain-snapshots/bsc/

fomoli535 commented 2 years ago

What a pain. I've started syncing from scratch a few days ago. Also in Execution stage right now. Seems like about 19 days to go. I can share a clean snapshot once that's done.

allada commented 2 years ago

What a pain. I've started syncing from scratch a few days ago. Also in Execution stage right now. Seems like about 19 days to go. I can share a clean snapshot once that's done.

I suspect it will take a bit longer than that. IIRC around October of last year it starts taking an extremely long time per block to run the Execution stage.

allada commented 2 years ago

I have good news. I have a chain that is back-up-to-date and it appears the issue is resolved.

I am in the process of uploading now. I expect it to be done uploading in about 3-4 hours.

(It turned out the estimator code does not work properly and gives wildly wrong estimates)

mnguyen-io commented 2 years ago

I have good news. I have a chain that is back-up-to-date and it appears the issue is resolved.

I am in the process of uploading now. I expect it to be done uploading in about 3-4 hours.

(It turned out the estimator code does not work properly and gives wildly wrong estimates)

Great news. Can I ask which commit of erigon you're using ?

allada commented 2 years ago

Great news. Can I ask which commit of erigon you're using ?

I am on: 46071798492d88ece8b78e2a26fb16a4b273e7b0

fomoli535 commented 2 years ago

Thanks for all the help :-)