Closed keo closed 9 years ago
Here's the system info:
OS: Ubuntu 14.04.2 Kernel: Linux ethnode 3.13.0-52-generic #85-Ubuntu SMP Wed Apr 29 16:44:17 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux golang: 2:1.4.2+3trusty-0ubuntu1
Came here to post this. I'm consistently getting this as well, on a daily basis. My full stack trace is here:
https://gist.github.com/tcoulter/811e78455eb5d23172a4
My system:
Debian Wheezy
On branch develop
, commit 02c5022742e2bf6d2aadca06a6a1655214ba9d55
go version go1.4.2 linux/amd64
@LefterisJP this is consistently crashing geth for me. Any chance this can be looked at before release?
Happened again, now with the latest commit on develop
(d1d45aa8390731ad9d0422e6bbf2d451d11dab4d)
@tcoulter I am not one of the Go developers, I am with the C++ team but I am sure the Go guys are taking a look at this as we speak. I am running geth too but haven't gotten this yet. Will try to reproduce.
Were you mining or just syncing when this occurs?
Could you build geth with the race detector enabled (godep go build --race ./cmd/geth
) and run it like that (be sure to run ./geth
) for a while and report any big DATA RACE
logs if you see?
@LefterisJP Whoops, sorry about that.
Mining and syncing, actually. My miner is consistently pinging the geth box, and geth was trying to catch up with the network since it had crashed previously. This happened when geth was caught up as well.
@karalabe Will do this now and run it. Sometimes it takes hours to crash, so will report back once I have anything.
@karalabe This just happened, right after starting it. Is this useful? https://gist.github.com/tcoulter/257e4d00f14b7b193798
For reference, I'm starting geth like this:
#!/bin/bash
geth --unlock=0 --etherbase="0xa94792e09954f15e8867eb544c17af1855726296" --rpc --maxpeers 100 --datadir /eth console
And eth is running via:
./eth.exe -G -F "http://192.168.1.12:8545" -t 3 --farm-recheck 100
Alright, I'm getting a ton of data races (more than 6, maybe - didn't count). Here's the full logs so far: https://gist.github.com/tcoulter/04b5afe44585dfcf245d
The miner data race is new to me, shouldn't be anything too serious, but will fix it tomorrow. The other one though seems serious enough. Are you on the latest develop? I've fixed two races yesterday or the day before I believe.
Ya, git log
says I'm on d1d45aa8390731ad9d0422e6bbf2d451d11dab4d, which is the latest commit on develop
.
I haven't run into the crash yet, but will post the full log when I do.
So, the fix for the last three races is https://github.com/ethereum/go-ethereum/pull/1511, though it shouldn't affect you or geth in any way. The data that could have been corrupted is never used on that code path. i'll prep the other tomorrow, yet imho that shouldn't be an issue either, but who knows.
So my geth process was eventually killed. I'm not sure by what -- I for sure didn't do it. But here's the full output. Gist wasn't very happy with me pasting in 6.5Mb, so added to mega: http://www.megafileupload.com/hqaW/geth_output.txt
Lots of races in there. Not sure if they're different from any of the others.
A few more races. Can't seem to get geth to synchronize anymore (see logs). Will turn off the miners and try again: https://gist.github.com/tcoulter/c81458cfe25fb38586b5
http://stats.ethdev.net confirms no peers:
When compiled with --race
, does geth use a lot of memory? It's currently use 53% of 24Gb, and it's continuing to grow.
@tcoulter don't production-mine with -race
. The race detector, as nice as it is, has very high resource overhead.
Ahh. Thanks.
Looks like the full crash happened again. Log here:
Yeah, the race detector is really expensive, though I was hoping that the crash might be related to a data race, that's why I suggested you should run with it. We've pushed a small race fix 1-2 days ago, and I'll try to address the other one now. Though I don't think it's the reason for the crashes you are seeing it might be worth a shot.
The downside is that if the database was corrupted some time ago due to a bug/crash, it may already be too late to fix it. It could probably help to see the offending database, but at it's current size that's probably not possible.
I'm happy to zip it up and make it available via my personal server, if you'd like to download it and take a look. If you can advise me what directory you'd need (and how to not send you my private keys) I'd appreciate it.
That would be really helpful. I'm not sure exactly what is faulting, so pack up the blockchain, state and extra folders from your datadir. These are all public datasets, whilst your keys are in the keystore folder. Make sure not to include that :) On Jul 24, 2015 5:51 PM, "Tim Coulter" notifications@github.com wrote:
I'm happy to zip it up and make it available via my personal server, if you'd like to download it and take a look. If you can advise me what directory you'd need (and how to not send you my private keys) I'd appreciate it.
— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-124548113 .
@karalabe Backup is 84Gb. I don't actually have that much free space on my personal server. Any thoughts on where I could put it?
Scratch that. Uploading to s3 now.
Any news on that upload? ;)
I still have the backup. Problem is I couldn't find a place to store it. Got anywhere that'll accept 80Gb?
On Sun, Aug 9, 2015 at 2:54 PM, Felix Lange notifications@github.com wrote:
Any news on that upload? ;)
— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-129250444 .
@tcoulter have you tried Bittorent Sync? (https://www.getsync.com/) Worked great for me when needed to share huge files device to device without having to upload it anywhere.
@tcoulter I think we can solve it without this dump. Thank you for going through such hoops to help us debug. Did this issue ever happen again?
It definitely happened again after I first reported it (it happened a handful of times actually). However, after switching to Frontier I've had no issues.
Thanks!
On Mon, Aug 10, 2015 at 7:46 AM, Felix Lange notifications@github.com wrote:
@tcoulter https://github.com/tcoulter I think we can solve it without this dump. Thank you for going through such hoops to help us debug. Did this issue ever happen again?
— Reply to this email directly or view it on GitHub https://github.com/ethereum/go-ethereum/issues/1505#issuecomment-129478920 .
I upgraded geth yesterday from PPA (to 0.9.39+682SNAPSHOT20150719122250trusty-0ubuntu1)
Then a few hours later I got a
panic: leveldb: batch corrupted: invalid records length
error.Here's the log: https://gist.github.com/keo/7a329bfd2ab455a0843a