Closed jo2jo closed 10 years ago
any work around you know of?
@ummjackson , I know you guys are working on a lot of cool stuff - could you tell us where this falls in the list of your teams priorities. This is really affecting my pool to the point where I may have to shut it down. Does this have something to do with the testnet you guys are working on?
What is the specific bug that is affecting your pool? What are your dependencies? What is the data for the difference between orphans before 1.5 and after 1.5? This issue title is misleading and about the problem while running the QT client, not about the dogecoind dameon running on linux machines.
The graph is also misleading as it shows 0 orphans before and many after, which is simply not true, probably the change to 1.5 allowed this data to be gathered.
When I upgraded a 1.2GB wallet.dat from 1.4.0 to 1.5.0, I instantly observed HUGE delays in JSON RPC (over 30 seconds per query). During the 30 seconds, the wallet was constantly flushing and writing 250MB/sec IO to disk (SSD based). This essentially made it completely useless to run as a pool wallet as I was only getting 2-3 payouts per minute (thanks RPC), and since MPOS (frontend) uses RPC for dashboard data, it was causing 30 second page load times (and im most cases, page load failures).
Dave-why don't you create a process to schedule a wallet.dat "roll" at certain intervals during a maintenance cycle... it seems way to big and not designed to handle such large wallet.dat file. Let me know what you think of the approach?
On Tue, Feb 4, 2014 at 2:30 PM, Dave notifications@github.com wrote:
When I upgraded a 1.2GB wallet.dat from 1.4.0 to 1.5.0, I instantly observed HUGE delays in JSON RPC (over 30 seconds per query). During the 30 seconds, the wallet was constantly flushing and writing 250MB/sec IO to disk (SSD based). This essentially made it completely useless to run as a pool wallet as I was only getting 2-3 payouts per minute (thanks RPC), and since MPOS (frontend) uses RPC for dashboard data, it was causing 30 second page load times (and im most cases, page load failures).
Reply to this email directly or view it on GitHubhttps://github.com/dogecoin/dogecoin/issues/208#issuecomment-34097169 .
Unless you have a specific procedure in mind, replacing a wallet (new address) is a pretty large undertaking on pools like mine.. If I am thinking what your thinking, it means i have to bounce all stratum servers and insert new wallet, then move ALOT of coins around and risk paying out orphans because if I have confirming blocks (50 confirms) if I replace the wallet, it means I loose the ability to track those immature blocks, which COULD orhpan after I replace the wallet and they are not confirms yet. (I did this once already, actually 3 weeks ago)
Do you have a wallet.dat growth curve projection? It has grown this large in just 3 weeks? There is a related bug where this may find a solution have you seen it? ticket is #187
On Tue, Feb 4, 2014 at 3:49 PM, Dave notifications@github.com wrote:
Unless you have a specific procedure in mind, replacing a wallet (new address) is a pretty large undertaking on pools like mine.. If I am thinking what your thinking, it means i have to bounce all stratum servers and insert new wallet, then move ALOT of coins around and risk paying out orphans because if I have confirming blocks (50 confirms) if I replace the wallet, it means I loose the ability to track those immature blocks, which COULD orhpan after I replace the wallet and they are not confirms yet. (I did this once already, actually 3 weeks ago)
Reply to this email directly or view it on GitHubhttps://github.com/dogecoin/dogecoin/issues/208#issuecomment-34105336 .
@billym2k - since the first day Dogecoin started till before the 1.5 upgrade, our pool had 4 orphan blocks in total. Now we have 11 - our total more than doubled after the 1.5 upgrade and it's only been 4 days. 7 of those orphan blocks occurred after the 1.5 upgrade.
@billym2k - right after the 1.5 upgrade, our pool experienced 3 orphan blocks in a row.
This is a very real issue, we experienced about 5 orphans in 2 days, and have now reverted back to 1.4.1, and no orphans since.
Last time there was a major release, over half the mining pools were mining on the wrong fork. There definitely seems to be a lack of communication between dogecoin devs and the mining pool community.
Also, when we upgraded to 1.5, we had to resync the entire blockchain. Painful upgrade. Now a painful downgrade back to 1.4.1
@educatedwarrior always backup your working dir, easy to swap back then :) but yeah it should not be an issue in the first place imo!
@add1ct3dd , thanks for the tip... that's a good idea. I got a db error when I tried to switch back though.
@casalej Our pool is using the static change=dogecoinaddy addresses and this issue still happens.
@add1ct3dd Did you get a database error when you did the switch back. I got a db error and had to delete the blockchain db and resync. If I didn't get the error, I would have been able to switch back without resyncing the db.
Nope, we backed up the 1.4.1 working directory, then updated dogecoind, and then started the new version.
When reverting back we just move the 1.4.1 db back, and only had to update the blocks from the time we stopped updating that directory.
I'm not seeing valid reason behind the orphaned blocks, just cause and effect hypothesis.
@billym2k , you serious? Well I don't see any of the major pools wanting to upgrade at the moment, until you can prove it is not true. Based on popular consensus right now, 1.5 is buggy.
Could it be sync issues @billym2k ? :D
I'm saying that clearly there is an issue, but I'm just getting the "there is an issue" part, without much idea of the cause. Possibly sync issues could be the case. It's difficult for me to test, I'm not running a pool.
OP (@jo2jo) mentioned that this was fixed in the latest Litecoin release - can someone please reference said commit to the Litecoin code base?
We have run 1.5 since it came out without any issue on our pool. Can you post your dogecoin conf(with rpc user/password info removed of course)
Are you 100% it's 1.5? I'm yet to see someone not have the issue, been checking around on IRC.
Yes I am 100% certain we are on 1.5 We have been on it for a while now without any uptick in orphans or other issues. Maybe its something different in our conf files?
The number of transactions also is key here.. a <1Gh pool might have no issues for a while but will eventually. My pool is 7Gh, and i INSTANTLY had an issue and had to revert.
This is happening even with a 300MH pool.. Are you sure you are on 1.5 netcodepool, this issue is happening to 4 other dogepool owners I know and even normal people seem to be desyncing, just not as often as the pools. ZC, is this same issue happening as well by chance? Or do yoy have no clue as you have already reverted? https://github.com/dogecoin/dogecoin/issues/217
We also had 5 or so orphans on 250Mh/s, and our overall block luck is 80%..
Laggy transactions is probably down to the daemon de-syncing, so sounds like the same issue imo.
@netcodepool post your config?
Went back to 1.4.1 and not one orphan since. I know that is not helpful.
We had this issue on RapidHash as well, got a total of 12 new orphans. This is happening because of wallet flushes with large wallets. The wallet locks up not synching the chain and orphaned blocks get discovered due to the chain lag.
We separated the payouts and mining wallets. 1.4.1 for payouts and 1.5 for mining.
@rog1121 Thanks for the explanation, this makes sense to me. Any ideas on how we could address this? (Assuming it's due to Dogecoin's # of transactions, because we're 99.99% based on the Litecoin codebase if you do a diff)
@ummjackson see this issue https://github.com/dogecoin/dogecoin/issues/217
Its due to tons of transactions that mining pools like mine send. On 1.5 TX ID's wouldn't even get broadcasted for 4-5 hours due to the wallet flush issue. Seems the best solution is to get rid of BDB and move to LevelDB as fast as we can.
EDIT: I might be mistaken, not sure if 1.5 is still using BDB. Either way, its large amounts of unbroadcasted transactions that are causing this issue.
@rog1121 Bitcoin/Litecoin never moved to LevelDB for wallet.dat, they moved for the blockchain database. Both still use BDB for wallet.dat, and we're running on the latest Litecoin code base - so I guess it's down to the sheer number of transactions running through Dogecoin? (yay we're popular, but not yay?)
This is going to require code changes separate to Litecoin to address our ridiculous number of transactions... any ideas what we need to change?
I'd rather not go too far off from Litecoin base, if Litecoin doesn't do that then we shouldn't either. Maybe look into why transactions are getting backlogged so much? Thats the root of the issue here.
@rog1121 So we've looked into this, and apparently some Bitcoin pools had a similar issue due to having reindexed their blockchain downloads from a 1.4 DB (ie. Bitcoin 0.6. * ) rather than doing a full resync on 1.5 (Bitcoin 0.8. * ). Are you able to test this somehow and do a completely blockchain resync from 1.5? Or are you running on a fresh blockchain download from 1.5?
I ran the payouts on a fresh wallet with a new chain. I did however use the boostrap file from dogechain.info so I'm not sure if that might be an issue?
@add1ct3dd @toxicwind @zccopwrx @netcodepool @educatedwarrior
Did you just start with a fresh blockchain on 1.5 or did you reindex/use a bootstrap file.
We reindexed
I really hate to do this to pool owners, but are you able to remove all pre-1.5 .dat files and try a fresh sync on 1.5? I have confirmed with @netcodepool that they did a clean resync, and that's the only difference I can see.
We were on a completely fresh db and wallet when 1.5 happened actually(I completely moved servers and decided to just start the wallet over)... http://bitinfocharts.com/dogecoin/address/DTzjPeJEirk3i5eQt3jtTBUk4dSjosMvTq http://bitinfocharts.com/dogecoin/address/D9X6eX1L5KFWA2PJ4jAm68JoLNdmMSJaXw http://bitinfocharts.com/dogecoin/address/DEFVgt9FPxRPhzgHGzzrGZPZfnY9bDfybQ http://bitinfocharts.com/dogecoin/address/DDqgC76vsrG3yC14Vez94UY2hBxSLsQpyj http://bitinfocharts.com/dogecoin/address/DLJNapEx6hhKtKRmNnCWF1zRU7sgBcvTPs All transactions have been done only on a 1.5 wallet and the chain was freshly downloaded. That isn't the issue.
@ummjackson I'll try moving payouts to a fresh synced 1.5 wallet today. With and without a single change address that you guys recently added the functionality for. I'll post back with results later tonight.
From my pool operator (ypool.net)
"I am still using the 1.4 Windows-Qt client. We actually have two wallets running, one only for submitting blocks and one for everything else (payouts, block confirmation etc.) This way the slow down of the transaction-filled wallet will not affect the submission wallet and blocks are always submitted as fast as possible." - jh00
@ummjackson - I did a fresh resync when we upgraded to 1.5 and still had issues.
Guys, I'm at an absolute loss here - we're doing nothing differently to Litecoin. Also, looking at http://bitinfocharts.com/comparison/orphaned-btc-ltc-ppc-doge-nmc.html (zoom in to 3 months) there has not been an increase in orphan blocks since 1.5 was released. Version 1.5 was released on January 27th - if anything it's been lower/more stable since 1.5 came out.
Any ideas or thoughts are appreciated, but we can't identify the root cause or reproduce the issue ourselves (Netcodepool are still not having issues).
OP mentioned they suspect this has to do with building using Boost 1.55, are you all building Dogecoin 1.5 with Boost 1.55? Can you try building with Boost 1.54 or earlier?
@ummjackson Just upgraded payouts to 1.5.1 built with Boost 1.55
It still seems to lag on payouts, here is a debug log https://doge.rapidhash.net/debug.log
@rog1121 - has the upgrade to a specified change address helped at all? Not exactly sure what I'm fishing for in this log file... this initial issue report was about something different I figure?
@ummjackson I reuploaded the debug log, the first one was incomplete. The last 1,000 lines or so are a round of transactions to users while the wallet is sort of locked up.
@toxicwind had a different theory. Since a pool has such large inputs like 900K doge. When it sends one small transaction it uses that 900k input and almost all of the pool balance goes into unconfirmed leading all of the other transactions to lag due to the first transaction needing 3 confirms.
@rog1121 I can see the orphan reporting here, is there where it stalls until you restart the daemon? As far as orphan reports go, the level of logging to debug.log changed between 1.4 and 1.5, so I actually believe that's why people are saying they're seeing more orphans The # of orphans is due entirely to our block time, and there's not much we can do for that short of bumping it to 5+ minutes and hard forking the network (which is kind of against the whole point of Dogecoin, so won't be happening).
@toxicwind 's theory is interesting, and as I said prior I'm completely stumped by this issue - so until we can identify a root cause I'm not sure what to do. Any ideas?
Guys, there's a bunch of different issues being listed here. I'm going to close this - please start a new issue with an exact explanation of the behavior you're seeing. For example, are your large wallets flushing, then locking up causing transactions to lag? Or is your issue the number of orphans being found? Please clarify so we can pinpoint this issue.
here is the bug, the result is transactions taking very long to verify (even with a very well connected client) and a major increase in orphaned blocks since version 1.5 of dogecoin client, below is mostly a copy and paste but it seemed critical enough that i wanted to be sure it was brought to the attention of the dogecoin developers asap:
(paste) There's a bug with the previous version of Litecoin where the client will randomly stop syncing. This results in your node basically being down until you restart the client. The way to know this is occurring is to check the resource monitor(go to task manager->performance tab). If Dogecoin isn't sending or receiving any data, it's likely down. When you restart the client, you will be several hours behind(whenever it stopped syncing) This was fixed in the newest version of Litecoin, but showed up in the 2nd most recent version(Dogecoin is based on Litecoin, and updating to the more recent codebase of Litecoin likely picked up the issue). This bug weakens the Dogecoin network a bit and can be very annoying. edit: If you aren't seeing transactions show up after you know they've been sent to you(IE mining pool withdrawal), then you are likely affected and need to simply restart the dogecoin-QT client to double check.
This is due to the "boost" library used when building. We were previously on Boost 1.54 I've recompiled the binaries and you can download the latest (fixed) (/end paste)
the issue is described in detail along with some user created patches / fixes:
http://www.reddit.com/r/dogecoin/comments/1wfj2t/dogecoin_15_suffers_from_an_old_litecoin_bug/
as you can see in these graphs the increase in orphaned blocks since version 1.5 release: http://bitinfocharts.com/comparison/orphaned-btc-ltc-ppc-doge-nmc.html