panic: leveldb: batch corrupted: invalid records length

keo commented 9 years ago

I upgraded geth yesterday from PPA (to 0.9.39+682SNAPSHOT20150719122250trusty-0ubuntu1)

Then a few hours later I got a panic: leveldb: batch corrupted: invalid records length error.

Here's the log: https://gist.github.com/keo/7a329bfd2ab455a0843a

keo commented 9 years ago

Here's the system info:

OS: Ubuntu 14.04.2 Kernel: Linux ethnode 3.13.0-52-generic #85-Ubuntu SMP Wed Apr 29 16:44:17 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux golang: 2:1.4.2+3trusty-0ubuntu1

tcoulter commented 9 years ago

Came here to post this. I'm consistently getting this as well, on a daily basis. My full stack trace is here:

https://gist.github.com/tcoulter/811e78455eb5d23172a4

My system:

Debian Wheezy On branch develop, commit 02c5022742e2bf6d2aadca06a6a1655214ba9d55 go version go1.4.2 linux/amd64

tcoulter commented 9 years ago

@LefterisJP this is consistently crashing geth for me. Any chance this can be looked at before release?

tcoulter commented 9 years ago

Happened again, now with the latest commit on develop (d1d45aa8390731ad9d0422e6bbf2d451d11dab4d)

https://gist.github.com/tcoulter/66dfb2dfe786b4bf776f

LefterisJP commented 9 years ago

@tcoulter I am not one of the Go developers, I am with the C++ team but I am sure the Go guys are taking a look at this as we speak. I am running geth too but haven't gotten this yet. Will try to reproduce.

Were you mining or just syncing when this occurs?

karalabe commented 9 years ago

Could you build geth with the race detector enabled (godep go build --race ./cmd/geth) and run it like that (be sure to run ./geth) for a while and report any big DATA RACE logs if you see?

tcoulter commented 9 years ago

@LefterisJP Whoops, sorry about that.

Mining and syncing, actually. My miner is consistently pinging the geth box, and geth was trying to catch up with the network since it had crashed previously. This happened when geth was caught up as well.

tcoulter commented 9 years ago

@karalabe Will do this now and run it. Sometimes it takes hours to crash, so will report back once I have anything.

tcoulter commented 9 years ago

@karalabe This just happened, right after starting it. Is this useful? https://gist.github.com/tcoulter/257e4d00f14b7b193798

tcoulter commented 9 years ago

For reference, I'm starting geth like this:

#!/bin/bash
geth --unlock=0 --etherbase="0xa94792e09954f15e8867eb544c17af1855726296" --rpc --maxpeers 100 --datadir /eth console

tcoulter commented 9 years ago

And eth is running via:

./eth.exe -G -F "http://192.168.1.12:8545" -t 3 --farm-recheck 100

tcoulter commented 9 years ago

Alright, I'm getting a ton of data races (more than 6, maybe - didn't count). Here's the full logs so far: https://gist.github.com/tcoulter/04b5afe44585dfcf245d

karalabe commented 9 years ago

The miner data race is new to me, shouldn't be anything too serious, but will fix it tomorrow. The other one though seems serious enough. Are you on the latest develop? I've fixed two races yesterday or the day before I believe.

tcoulter commented 9 years ago

Ya, git log says I'm on d1d45aa8390731ad9d0422e6bbf2d451d11dab4d, which is the latest commit on develop.

tcoulter commented 9 years ago

I haven't run into the crash yet, but will post the full log when I do.

karalabe commented 9 years ago

So, the fix for the last three races is https://github.com/ethereum/go-ethereum/pull/1511, though it shouldn't affect you or geth in any way. The data that could have been corrupted is never used on that code path. i'll prep the other tomorrow, yet imho that shouldn't be an issue either, but who knows.

tcoulter commented 9 years ago

So my geth process was eventually killed. I'm not sure by what -- I for sure didn't do it. But here's the full output. Gist wasn't very happy with me pasting in 6.5Mb, so added to mega: http://www.megafileupload.com/hqaW/geth_output.txt

Lots of races in there. Not sure if they're different from any of the others.

tcoulter commented 9 years ago

A few more races. Can't seem to get geth to synchronize anymore (see logs). Will turn off the miners and try again: https://gist.github.com/tcoulter/c81458cfe25fb38586b5

http://stats.ethdev.net confirms no peers:

tcoulter commented 9 years ago

When compiled with --race, does geth use a lot of memory? It's currently use 53% of 24Gb, and it's continuing to grow.

fjl commented 9 years ago

@tcoulter don't production-mine with -race. The race detector, as nice as it is, has very high resource overhead.

tcoulter commented 9 years ago

Ahh. Thanks.

tcoulter commented 9 years ago

Looks like the full crash happened again. Log here:

https://gist.github.com/tcoulter/cd80e5ca6d928c0e86a7

karalabe commented 9 years ago

Yeah, the race detector is really expensive, though I was hoping that the crash might be related to a data race, that's why I suggested you should run with it. We've pushed a small race fix 1-2 days ago, and I'll try to address the other one now. Though I don't think it's the reason for the crashes you are seeing it might be worth a shot.

The downside is that if the database was corrupted some time ago due to a bug/crash, it may already be too late to fix it. It could probably help to see the offending database, but at it's current size that's probably not possible.

tcoulter commented 9 years ago

I'm happy to zip it up and make it available via my personal server, if you'd like to download it and take a look. If you can advise me what directory you'd need (and how to not send you my private keys) I'd appreciate it.

karalabe commented 9 years ago

That would be really helpful. I'm not sure exactly what is faulting, so pack up the blockchain, state and extra folders from your datadir. These are all public datasets, whilst your keys are in the keystore folder. Make sure not to include that :) On Jul 24, 2015 5:51 PM, "Tim Coulter" notifications@github.com wrote:

I'm happy to zip it up and make it available via my personal server, if you'd like to download it and take a look. If you can advise me what directory you'd need (and how to not send you my private keys) I'd appreciate it.

— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-124548113 .

tcoulter commented 9 years ago

@karalabe Backup is 84Gb. I don't actually have that much free space on my personal server. Any thoughts on where I could put it?

tcoulter commented 9 years ago

Scratch that. Uploading to s3 now.

fjl commented 9 years ago

Any news on that upload? ;)

tcoulter commented 9 years ago

I still have the backup. Problem is I couldn't find a place to store it. Got anywhere that'll accept 80Gb?

On Sun, Aug 9, 2015 at 2:54 PM, Felix Lange notifications@github.com wrote:

Any news on that upload? ;)

— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-129250444 .

keo commented 9 years ago

@tcoulter have you tried Bittorent Sync? (https://www.getsync.com/) Worked great for me when needed to share huge files device to device without having to upload it anywhere.

fjl commented 9 years ago

@tcoulter I think we can solve it without this dump. Thank you for going through such hoops to help us debug. Did this issue ever happen again?

tcoulter commented 9 years ago

It definitely happened again after I first reported it (it happened a handful of times actually). However, after switching to Frontier I've had no issues.

Thanks!

On Mon, Aug 10, 2015 at 7:46 AM, Felix Lange notifications@github.com wrote:

@tcoulter https://github.com/tcoulter I think we can solve it without this dump. Thank you for going through such hoops to help us debug. Did this issue ever happen again?

— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-129478920 .

ethereum / go-ethereum

panic: leveldb: batch corrupted: invalid records length #1505