btc1 / bitcoin

btc1 project bitcoin implementation
MIT License
329 stars 55 forks source link

Segmentation violation #81

Closed upminers closed 7 years ago

upminers commented 7 years ago

In two days both of our btc1 nodes have crashed twice with "segmentation violation" errors. Our production core nodes have not crashed at all in months of operation.

Is there an ETA for when this will be fixed?

h0jeZvgoxFepBQ2C commented 7 years ago

Maybe you should think about just runing BIP148, it runs fine and is much better reviewed than this unfinished proposal. See here for sourcecode and binaries: https://github.com/UASF/bitcoin/releases/

JaredR26 commented 7 years ago

Yeah forking off to an empty chain on August 1st and then having to do a POW change is going to work wonders, go UASF!

More seriously, given that this is the first report of crashes, the team would need a lot more information to address them. Logs? Uptime? OS / hardware? 64/32bit? Memory dump? Testnet or mainnet? What exact version/build? What are/were you doing with the node?

Memory dump and logs might contain identifying information so please don't post them unless Jeff or someone else asks, but they will be needed. Do you have them?

upminers commented 7 years ago

Mainnet, 64 bit, linux. These nodes are used only for mining. How do I do a memory dump?

JaredR26 commented 7 years ago

Hmm, what version was this? You said several days? It shouldn't have crashed, but I generally wouldn't suggest running an alpha release in mainnet. What kind of mining are you doing? Was this hooked up to a stratum server or something else between the miners and the node? What's your command line and config?

(I'll do what I can to try to reproduce and help until Jeff had a chance to respond or someone else)

bitofalefty commented 7 years ago

This is a PSA rather than an accusation, but OP's github account was newly created to post this issue. I expect OP will produce logs etc but we should be wary of FUD considering the situation.

jgarzik commented 7 years ago

Right - can you please make available debug.log, hardware or VM configuration, which build or git commit you're running - all the info needed to dig deeper - thanks.

ETA: Running inside gdb will also capture additional diagnostic information automatically, e.g. https://bitcoin.stackexchange.com/questions/51472/debugging-bitcoind

jmprcx commented 7 years ago

Please provide the relevant SEGFAULT information stored in /var/log/messages as well. If this report is legitimate we can start working to identify the root cause from this information.

cat /var/log/messages | grep segfault

ghost commented 7 years ago

i have the feeling we wont hear from recently created @upminers account anymore

upminers commented 7 years ago

@jgarzik We are using the beta from https://github.com/btc1/bitcoin/releases/tag/v1.14.3 our nodes are supermicro xeon d-1541 with 64 GB ram, there is no VM. In all case the last line in the log was CreateNewBlock and then some sizes. It looks like the logs were erased when we tried restarting.

Outages are very expensive for us, we can switch a node back in but I want to make sure that we collect the right information.

mpatc commented 7 years ago

the logs were erased when we tried restarting.

Have you tried saving the logs before restarting. This has already happened twice?

On Jul 16, 2017 11:14 PM, "upminers" notifications@github.com wrote:

We are using the beta from https://github.com/btc1/ bitcoin/releases/tag/v1.14.3 our nodes are supermicro xeon d-1541 with 64 GB ram, there is no VM. In all case the last line in the log was CreateNewBlock and then some sizes. It looks like the logs were erased when we tried restarting.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/btc1/bitcoin/issues/81#issuecomment-315660830, or mute the thread https://github.com/notifications/unsubscribe-auth/APNcvmPshevpxRVQOaEBfTmXoS680-n1ks5sOtGVgaJpZM4OZbs5 .

upminers commented 7 years ago

We can make sure to save them but will that be enough? The logs looked normal to me except the last line was one with total size rather than 'validity', it looks like they come in pairs. What else should I do so that if it crashes again we learn something?

kanoi commented 7 years ago

Make sure you have in bitcoin.conf: shrinkdebugfile=0

... and of course don't delete them :p

upminers commented 7 years ago

Is this fixed in 1.14.4? Should we try running that? Any options other than the debug shrinking we should set? The logs were not very informative and looked just like our core nodes except for stopping between CreateNewBlock messages.

donaloconnor commented 7 years ago

It's difficult to tell if this is fixed or not without knowing what's causing the SEG fault.

Can you do what @jgarzik suggested and run it using gdb so we can see a backtrace? Segmentation faults are not something we want happening on mainnet so this should be resolved ASAP IMO.

JaredR26 commented 7 years ago

The latest version is a beta so I would definitely suggest running that. Though as already stated, no way to know if that will fix it. Add the debug lines to config and we can help next time it occurs.

upminers commented 7 years ago

I didn't see the gdb message before. We'll do that, with the latest version on a test system. Will it make it much slower?

I have to say that this is very inconvient timing, we've been given no time to do a safe deployment.

JaredR26 commented 7 years ago

I have to say that this is very inconvient timing, we've been given no time to do a safe deployment.

Now you're starting to sound like you're trolling... Not sure if that was your goal or not. There was no need or reason to run an alpha release in a production main net. The beta release was just compiled last night.

jgarzik commented 7 years ago

Also, rule out flaky hardware, run memtest86 + verify that the temperature in the room is below 80F / 26C

whitematrix62 commented 7 years ago

Given the newness of op account and the lack of any log to work on, this issue should be closed. It can be reopened later if he can produce the log and we can debug the issues.

upminers commented 7 years ago

@jgarzik happened twice on two different systems, I do not think it is hardware but will look into that test. We have it running with gdb now.

@JaredR26 the signaling has started in two days all our blocks will be rejected if we do not run btc1. If we did not run the earlier release we would not have even known of potential issues. Do you not see how this creates an impossible situation for us? Deploy software with one day notice from new vendor, is not reasonable.

JaredR26 commented 7 years ago

the signaling has started in two days all our blocks will be rejected if we do not run btc1

This isn't correct, the only blocks that get rejected might be from UASF on August 1st. I'm not sure if this repo ever ended up merging any rejection rules pre-hardfork, but there are definitely no rejections that will take place until August 1st or thereabouts. I think you've been misinformed by the /r/bitcoin or blockstream PR brigades.

I don't think this issue should be closed yet, newness of accounts doesn't make the issues not legitimate (by itself).

ghost commented 7 years ago

i would recommend not to feed the troll until he produces the logs. otherwise you are just wasting time

aceat64 commented 7 years ago

I'm not sure if this repo ever ended up merging any rejection rules pre-hardfork, but there are definitely no rejections that will take place until August 1st or thereabouts.

This software will reject non-SegWit signalling blocks upon BIP91/bit4 activation: https://github.com/btc1/bitcoin/blob/segwit2x/src/validation.cpp#L1863

jrallison commented 7 years ago

This isn't correct, the only blocks that get rejected might be from UASF on August 1st.

BTC1 works by activating BIP91, which rejects/orphans non-segwit signaling blocks when it activates. It's essentially the same thing as the UASF approach, but it's deadline is triggered on 80% hashrate rather than a flag day (Aug 1st).

the signaling has started in two days all our blocks will be rejected if we do not run btc1.

@upminers good news! Your blocks will only be rejected if they're not signaling bit 1 for segwit activation. You can run any software that does that. This includes Bitcoin Core, BTC1, etc.

That's the case at least for the time being. When the hard fork deadline approaches in a few months, you may have to run BTC1 if you'd like to follow the fork.

donaloconnor commented 7 years ago

@upminers just let us know if you get any more info, core dumps etc. Btw did the segfault not cause a core dump? If not you might have to use: ulimit -c unlimited in bash before launching the bitcoind.

Also, it would be good not to judge based on account age. Not everyone is a dev on github already.

h0jeZvgoxFepBQ2C commented 7 years ago

This includes Bitcoin Core

This is wrong, bitcoin core will also accept chains based on non bit-1 blocks after august 1st, which is not the case with BIP91 or BIP148.

jrallison commented 7 years ago

This is wrong, bitcoin core will also accept chains based on non bit-1 blocks after august 1st, which is not the case with BIP91 or BIP148.

@lichtamberg It'll build compatible blocks, but yeah, I suppose it may build on top of invalid blocks? I suppose that's only the case until the BIP91 or BIP148 chain is longer. Which if BIP91 activates with 80% hashrate, will be the case fairly quickly. Or am I missing something else?

h0jeZvgoxFepBQ2C commented 7 years ago

Probably quickly - but still risky (and maybe costly) in comparison to take one more minute and choose the proper software to run. Just wanted to point out that BIP148 would be a safer option in this case than running core.

JaredR26 commented 7 years ago

Unless bit 4 doesn't lock in by August 1st, if that happens then this user will be forked off onto a chain that is not viable without a POW change.

Also there's nothing to indicate that uasf software is safer. No one is using UASF software with real money on the line, and there's no point in trolling something that isn't a real threat. It would be a mistake for the op to be the first to put real money on the line with UASF.

CosmicHemorroid commented 7 years ago

@JaredR26 " I think you've been misinformed by the /r/bitcoin or blockstream PR brigades."

Grow up.

kek-coin commented 7 years ago

@JaredR26 @CosmicHemorroid please take the political stuff outside of github.

jgarzik commented 7 years ago

Closing - no diag data - will reopen if more appears.