ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.57k stars 20.14k forks source link

"Head state missing, repairing chain" I lost many blocks , how can I get them back? #19124

Closed relaxbao closed 5 years ago

relaxbao commented 5 years ago

Hi,

I run geth using supervisor on Ubuntu as a privatechain, the geth crashed at yesterday because generating DAG file need more CPU than our server

when it started , we lost many blocks , the blocknumber is from 1110002 back to 422522. I need the lost blocks ,can I find them back?

looking forward to your response, thank you ~

System information

Geth version: 1.8.11 OS & Version: Linux Ubuntu

Expected behaviour

the height is 1110002

Actual behaviour

now the blockNumber is 422522, it shold be 1110002, how can I back to 1110002. here is the log

Head state missing, repairing chain      number=1110002 hash=3610a2…5c104d
INFO [02-18|16:07:13] Rewound blockchain to past state         number=422522  hash=e6875a…a56025
INFO [02-18|16:07:13] Loaded most recent local header          number=1110002 hash=7620a2…5c104d td=875201842768
INFO [02-18|16:07:13] Loaded most recent local full block      number=422522  hash=e6875a…a56025 td=460514461017
INFO [02-18|16:07:13] Loaded most recent local fast block      number=1110002 hash=7620a2…5c104d td=875201842768

Steps to reproduce the behaviour

I restarted geth , but it's just the same

karalabe commented 5 years ago

Geth keeps the state in memory (and garbage collects in memory) and only flushes every hour or so. If Geth crashes, whatever was in memory is lost.

In your case the block data is still there, just the historical states got lost so the chain rolled back. Normally this is not a big of an issue as when you reconnect to the network, Geth reprocesses from a past block. If you run a single node however, there might be no remote peer with the data.

Long term I think we should fix Geth so that it reprocesses the blocks locally instead of reaching out to the network. Short term that won't help you, but you could try to do a geth export chain.rlp 0 1110002 and then import into a different datadir (to make sure you don't lose any data).

tsujp commented 5 years ago

This only affects blocks right? Not the keystore etc?

relaxbao commented 5 years ago

This only affects blocks right? Not the keystore etc?

yes, It only affects blocks.

acctually, I can still get the Transactions in the higher blocks , but when it starts mining , the blockNumber is increasing from the lower number 422522.

on the other hand, the storages in the contract were back to the status in the blockNumber 422522 . while i need the status in the 1110002

can I find some way to solve this problem?

karalabe commented 5 years ago

I wrote in my previous comment that you could have exported your chain and fixed it that way. If you started mining on top, it's probably way too messy now to try and extract the correct blocks.

@hito This only affects the state, yes.

relaxbao commented 5 years ago

Geth keeps the state in memory (and garbage collects in memory) and only flushes every hour or so. If Geth crashes, whatever was in memory is lost.

In your case the block data is still there, just the historical states got lost so the chain rolled back. Normally this is not a big of an issue as when you reconnect to the network, Geth reprocesses from a past block. If you run a single node however, there might be no remote peer with the data.

Long term I think we should fix Geth so that it reprocesses the blocks locally instead of reaching out to the network. Short term that won't help you, but you could try to do a geth export chain.rlp 0 1110002 and then import into a different datadir (to make sure you don't lose any data).

Thank you so much , I think it's a great way to save all the data . But after export my data , and import it to a new datadir , i found an Error.

INFO [03-14|11:23:04] Imported new chain segment               blocks=2500 txs=43   mgas=2.321  elapsed=2.286s mgasps=1.015  number=420000 hash=ce1ed8…6dc233 cache=1.12mB
INFO [03-14|11:23:06] Imported new chain segment               blocks=2500 txs=5    mgas=1.446  elapsed=2.098s mgasps=0.689  number=422500 hash=0b6e6d…117d9e cache=1.12mB
ERROR[03-14|11:23:06] Non contiguous block insert              number=423619 hash=100756…a3b36b parent=ff2b53…f13d44 prevnumber=423618 prevhash=8d7930…bcebb6
ERROR[03-14|11:23:06] Import error                             err="invalid block 423619: non contiguous insert: item 1117 is #423618 [8d793034…], item 1118 is #423619 [10075641…] (parent [ff2b53e8…])"
INFO [03-14|11:23:06] Writing cached state to disk             block=422500 hash=0b6e6d…117d9e root=e6191e…df4f85

here is my block 423618 and 423619 and the parentBlock of 423619, is there something wrong with it ?

> eth.getBlock(423618)
{
  difficulty: 941215,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0x8d79303491e8384dedb57812e5c8eefd83d8125e5c287a7009000ed292bcebb6",
  logsBloom: "0x
  miner: "0xa969f32fcdc83a6039286f267f2e7a246b4b030a",
  mixHash: "0xad860354cf2e2b2a9e2c13bb72d1abef7606ea6e1d60f4d995e0e0e09f159f22",
  nonce: "0x230828cb4887e0b0",
  number: 423618,
  parentHash: "0xe2025cb6ddf79f3dc2414301b715b54a9aad10b0f25e494882133c2551377493",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xce0241488e1af373ffb9ae91eaf74cbeacb7984c0a3e293c64f676eed1c36fc1",
  timestamp: 1550546650,
  totalDifficulty: 460584013838,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}
> eth.getBlock(423619)
{
  difficulty: 1000444,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0x10075641add742f1447a67c9fc1136a5492a9b622e883042f38864ba33a3b36b",
  logsBloom: "0x
  miner: "0xd271baa1ed277c3730ad5b88ef97a5921d7a8c77",
  mixHash: "0x59f649b3123e348d84c889ae2e50f9240f1b826f65d2818c803ef94c15e6c84e",
  nonce: "0x0d7db510f26a7025",
  number: 423619,
  parentHash: "0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xba6860c90a95997f28b4da3c7f42cbf8d2e48e9f288dfeefd7e374b164eff745",
  timestamp: 1536726254,
  totalDifficulty: 460585640739,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}
> eth.getBlock("0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44")
{
  difficulty: 999952,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44",
  logsBloom: "0x
  miner: "0x10592c5155ad6655189bac1a61af49083e37152c",
  mixHash: "0xc52cd8c7110438d401e37dd359e31dfd719ec8ff20bfc7827697b40d9afc3ad4",
  nonce: "0x5e46c9d131d5a90d",
  number: 423618,
  parentHash: "0xc8a26455ee1781826047fe913824669fd3f30646b59fb226f917869f3cbcafb1",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xddbf4b764d59a7c11cf453a5bfaa90ad5b45ced46c159acacd82e550da59e55d",
  timestamp: 1536726248,
  totalDifficulty: 460584640295,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}

I used the following steps to recover my data, is there something wrong with it ?

geth --datadir /home/workspace/recoverdatas/rdata export /home/workspace/recoverdatas/rdata/chain423619.rlp 0 423619

geth --datadir "/home/workspace/recoverdatas/datanew" init "/home/workspace/data/conf/genesis.json"

geth --datadir /home/workspace/recoverdatas/datanew import /home/workspace/recoverdatas/rdata/chain423619.rlp
BeOleg commented 5 years ago

I get this as well every time I restart the node via docker compose

root@nexwallet-eth3:/mnt/STORAGE/WALLETS# cat /var/lib/docker/containers/23406ad9bc38a5b9ab3c8e342295c91f2550d3b5c16f00d681f181aed9721c0d/23406ad9bc38a5b9ab3c8e342295c91f2550d3b5c16f00d681f181aed9721c0d-json.log | grep 'Head state missing'
{"log":"WARN [04-24|12:13:48.884] Head state missing, repairing chain      number=7623347 hash=b6c254…3a0dad\n","stream":"stderr","time":"2019-04-24T12:13:48.885446409Z"}
{"log":"WARN [04-25|19:19:10.008] Head state missing, repairing chain      number=7638271 hash=c73676…151eb8\n","stream":"stderr","time":"2019-04-25T19:19:10.008881355Z"}
{"log":"WARN [04-26|08:52:25.125] Head state missing, repairing chain      number=7641791 hash=c263d9…dfbe79\n","stream":"stderr","time":"2019-04-26T08:52:25.125447127Z"}
{"log":"WARN [04-28|09:21:28.698] Head state missing, repairing chain      number=7655058 hash=f11663…507ae5\n","stream":"stderr","time":"2019-04-28T09:21:28.698886159Z"}

Or if it restarts due to some fault, I lost a day or 2 of blocks. How to solve this? how to properly restart?

holiman commented 5 years ago

@BeOleg I see that you've opened https://github.com/ethereum/go-ethereum/issues/19504 , so let's continue that one there.

@relaxbao yes, something is wrong with it! It seems to have lost track of the canon chain, and there's a discrepancy in the chain. This is very interesting, however, since you're on a very old version 1.8.11, I doubt we'll be able to go to the bottom of that.

holiman commented 5 years ago

@relaxbao your scenario was fixed in #19514

relaxbao commented 5 years ago

@relaxbao your scenario was fixed in #19514

@holiman Thank you very much . but I still have to questions :

  1. Can I get the blocks back if I stay in the version 1.8.11 ?
  2. Is there someway to avoid this happen again ?