Closed rustyrussell closed 7 years ago
Has anyone tried the Qt version to see if it has the same problem?
And BTW, are the release versions (of the daemon & Qt) compiled in debug mode?
Last question: can a unit test be cobbled up to test for this memory issue?
Ron
@older-c-coder I have not tested with Qt. A unit test would likely be difficult to write, since the issue is only apparent in terms of RSS usage after calling GBT frequently for around 24 hours. It's possible that this process might be accelerated (e.g. create a fake mempool from block X, run GBT/CNB, X++;, etc). Even then, such a test is likely to take a while, and we don't yet know if that approach will work at triggering the leak.
There has been IRC discussion of this in bitcoin-core-dev on the 20th and 21st.
I do not think there is an actual "leak". The increased memory usage is mostly a result of the not actually limited size of CCoinsViewCache discussed in #6872 combined with poor memory allocator behavior.
The problem is triggered by calling getblocktemplate which calls CreateNewBlock
. In CreateNewBlock
every single tx in your memory pool is pulled into the in memory cache of both pcoins tip and a cache view
on top of that. The memory allocator also causes this memory to be allocated in a different location for every thread that calls getblocktemplate. Although this memory is freed at the end of CreateNewBlock
, there seems to be enough fragmentation that there is still some memory being used towards the end of the heap and the bulk of the memory can not be returned to the OS. This can cause up to # of RPC threads (default: 4) times the memory allocation needed for 1 full memory pool worth of chainstate to remain resident. In addition, depending on the size of your dbcache, if one of the calls to getblocktemplate causes a rehash of the unordered map in your pcoinsTip cache, then that can also take up space in another memory arena because now it has been allocated from a different thread.
I'm working on a short term smaller rewrite of CreateNewBlock
which will not require pulling the entire memory pool worth of txs into the cache, which should make the problem much better.
http://toom.im/files/massif.out.27357.log Or the raw output, if you want to run ms_print on it with your own options: http://toom.im/files/massif.out.27357
(For best viewing, disable word wrap, like with less -S massif.out.27357.log
.)
The highest memory usage appears to be at timestamp 47. Search for "47 68" to jump there. It looks like at that time, a total 46.64% of the heap (411 MB) was in use by CCoinsViewCache via HTTPReq_JSONRPC and CreateNewBlock. The two offending lines in miner.cpp appear to be 175 if (!view.HaveCoins(txin.prevout.hash))
and 348 if (!TestBlockValidity(state, *pblock, previndexPrev, false, false))
, the latter of which results in a HaveCoins call later.
TL.DR from IRC: this seems specific to the glibc multi-thread arena memory allocator. The application deallocates the memory properly, but pages are not given back to the system after deallocation (probably due to fragmentation allocating lots of small objects). Workaround that we determined on IRC is export MALLOC_ARENA_MAX=1
, see https://gist.github.com/laanwj/efe29c7661ce9b6620a7#linux-specific
Can confirm the work around works. ( export MALLOC_ARENA_MAX=1 )
Seems to be working for me too. 724M RSS after 10 hours. Ten hours isn't long enough to be certain. I'll chirp in tomorrow if it's still a problem.
Edit: it's a day later, and RSS is at 1020 MB. Normally, I would expect memory usage near 1500 MB by now. Seems that MALLOC_ARENA_MAX is working.
2.9G RSS after restarting with export MALLOC_ARENA_MAX=1 this morning (about 14 hours ago). Still undesirably high imo.
@flound1129 what platform are you on, and what is your -maxmempool setting and -dbcache setting?
Linux p1 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.3 LTS Release: 14.04 Codename: trusty
haven't changed maxmempool or dbcache setting from defaults.
Wait, are you running master with the mempool limiting stuff in, or a released version?
On October 27, 2015 10:49:02 PM PDT, Adam McKenna notifications@github.com wrote:
uname -a
Linux p1 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.3 LTS Release: 14.04 Codename: trusty
haven't changed maxmempool or dbcache setting from defaults.
Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/6876#issuecomment-151732539
This is on 0.11.1
0.11.1 doesn't have any memory limits at all for the memory pool.
So you're saying this issue is fixed in master? Or you would like me to test with master? My mempool is only around 162 megs.
No, sipa is saying there are two possible reasons why your memory consumption could be high. One reason is that you are running into the multi-arena allocating/fragmentation bug described here in this issue, #6876. The other is that your mempool is getting bloated from the flood of 14780 byte transactions we're seeing right now.
Try sticking minrelaytxfee=0.0002
into your ~/.bitcoin/bitcoin.conf
file, or running with the minrelaytxfee=0.0001
command line option. That should reduce your mempool size by about 161 megs.
I'm not worried about having a large mempool, or really even worried about 3GB of total usage. I'm only concerned if there is an actual leak.
I dont think we've seen any evidence of an actual leak, just bloated mempools and poor memory allocation algorithms.
I was seeing this as well on my pool's node, it would appear setting limitfreerelay=0 resolves it. This bug appears to cause severe GBT latency that gets worse the longer your node is up. Using my pool block update bench-marking tool my pool went from near the bottom of the list to the top by simply setting limitfreerelay=0. My guess is there is a bug or leak related to mempool handling of free transactions.
Large mempools result in slow GBT calls. This is not a memory leak as far as we know, just not sufficiently optimized code, and no actual limit on the mempool's size.
There is a lot of active work around improving that. One is the mempool limiting code (which has been merged into master already, but not yet in a release). There are also pull requests in flight to rewrite or otherwise optimize its performance using precomputed indexes.
I noticed there was a massive discrepancy between "currentblocksize" : 473630 from getmininginfo and "bytes" : 6934459 from getmempoolinfo even thought the transaction count was roughly the same between those two RPC calls, setting limitfreerelay=0 resolved that and they now match. BTW I also run minrelaytxfee=0.0001.
I believe jameshillard's issue to be different from the memory fragmentation issue reported in this thread. As far as I understand it from IRC, he was getting a lot of the 14790 txns into his mempool via limitfreerelay=default, like these two (from his getrawmempool RPC):
https://www.blocktrail.com/BTC/tx/951761082b38167ca92a94cce9f0783157434ffc66ff39715b1ac30a72cd6611
https://www.blocktrail.com/BTC/tx/5935e25d931734e42e0fff881df79ea9e0d3f4c9d552e9ac20f8825b29049f0a
These two transactions were never getting evicted from his mempool, and were not getting incorporated into blocks by other miners. These were 2 of the 4 transactions that I checked, indicating that very roughly 50% of his mempool was filled with spam. Much more of them were in his mempool than could fit in his 5% priority block space, so his GBT block sizes were kept small even though his mempool was bloated. He was complaining of mempool size, not of RSS usage. I do not think he was encountering any bugs, just features that he did not understand and that may have not been designed optimally in terms of UX.
"Using my pool block update bench-marking tool my pool went from near the bottom of the list to the top by simply setting limitfreerelay=0." -- Actually, it's probably just that you restarted your server, which clears out the mempool. Setting limitfreerelay=0 would change the long-term slope of your GBT latency graph and mempool size to be closer or equal to zero, but the y intercept will be low in any case.
Actually, it's probably just that you restarted your server, which clears out the mempool.
I've let it run for long enough that I don't think that's the issue, before setting that flag GBT performance would degrade in a matter of hours. After changing limitfreerelay to 0, "currentblocksize" from getmininginfo and "bytes" from getmempoolinfo are now the same. The transaction counts were always roughly the same between getmininginfo and getmempoolinfo before and after this change, I guess it's possible that there were some extremely large free transactions taking up all the space.
Setting limitfreerelay=0 would change the long-term slope of your GBT latency graph and mempool size to be closer or equal to zero, but the y intercept will be low in any case.
The issue wasn't with the y intercept it was that it was sloping up over time.
This is likely fixed since 0.12
See http://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011511.html and thread.