Closed thoragio closed 7 years ago
Further to @berrycoiled's observation here, I can confirm that the web server was not running when the crashes above occurred.
Still crashing even after a reboot. Seems to be a memory leak given how RAM is getting used up.
Yup server crashes after some time even on docker environment
![Uploading decentrelworld_server_crash_issue_term_out.PNG…]()
Have just observed this error on a Linux node with the latest code as of today 1f74a01b037c18f24ba4fc42e9e9d323ef1ccdc7. Happened after about an hour of runtime.
I see there was a Number exceeds 2^53-1
error immediately before. Unclear whether it was coincidental:
[warning] WARNING: Reorganizing chain.
[error] Number exceeds 2^53-1
{ AssertionError [ERR_ASSERTION]: Number exceeds 2^53-1
at Object._readU64 (/home/berry/sft/land-bronze-new/lib/utils/encoding.js:71:3)
at Object.readU64 (/home/berry/sft/land-bronze-new/lib/utils/encoding.js:85:19)
at Object.readVarint (/home/berry/sft/land-bronze-new/lib/utils/encoding.js:470:24)
at BufferReader.readVarint (/home/berry/sft/land-bronze-new/lib/utils/reader.js:489:25)
at UndoCoin.fromReader (/home/berry/sft/land-bronze-new/lib/coins/undocoins.js:284:17)
at Function.fromReader (/home/berry/sft/land-bronze-new/lib/coins/undocoins.js:321:25)
at UndoCoins.fromRaw (/home/berry/sft/land-bronze-new/lib/coins/undocoins.js:100:30)
at Function.fromRaw (/home/berry/sft/land-bronze-new/lib/coins/undocoins.js:112:26)
at ChainDB.getUndoCoins (/home/berry/sft/land-bronze-new/lib/blockchain/chaindb.js:973:20)
at getUndoCoins.next (<anonymous>)
generatedMessage: false,
name: 'AssertionError [ERR_ASSERTION]',
code: 'ERR_ASSERTION',
actual: false,
expected: true,
operator: '==' }
[info] Received 109 addrs (hosts=110, peers=7) (128.199.165.151:2301).
[info] Peer initialized compact blocks (128.199.165.151:2301).
<--- Last few GCs --->
[1811:0x55fa570c26c0] 4572679 ms: Mark-sweep 1414.3 (1561.6) -> 1414.2 (1531.1) MB, 1681.5 / 0.3 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1682 ms) last resort
[1811:0x55fa570c26c0] 4575162 ms: Mark-sweep 1414.2 (1531.1) -> 1414.0 (1531.1) MB, 2481.9 / 0.0 ms last resort
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x10774939ba79 <JS Object>
0: builtin exit frame: stringify(this=0x1077493acee9 <a JSON with map 0x81967f8bec1>,0x107749382241 <undefined>,0x262e27e76a19 <JS Function (SharedFunctionInfo 0x21a777492561)>,0x262e27e76f59 <an Object with map 0x52cfa8dda61>)
1: _transform [/home/berry/sft/land-bronze-new/node_modules/ndjson/index.js:~28] [pc=0x34796a79e84d](this=0x3288155bcde9 <a DestroyableTransform with map 0x10d2...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [decentraland-node]
2: 0x55fa560156de [decentraland-node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [decentraland-node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [decentraland-node]
5: v8::internal::Factory::NewTransitionArray(int) [decentraland-node]
6: v8::internal::TransitionArray::Insert(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Map>, v8::internal::SimpleTransitionFlag) [decentraland-node]
7: v8::internal::Map::CopyReplaceDescriptors(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Handle<v8::internal::LayoutDescriptor>, v8::internal::TransitionFlag, v8::internal::MaybeHandle<v8::internal::Name>, char const*, v8::internal::SimpleTransitionFlag) [decentraland-node]
8: v8::internal::Map::CopyAddDescriptor(v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [decentraland-node]
9: v8::internal::Map::CopyWithField(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [decentraland-node]
10: v8::internal::Map::TransitionToDataProperty(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Object::StoreFromKeyed) [decentraland-node]
11: v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::StoreFromKeyed) [decentraland-node]
12: v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::ShouldThrow, v8::internal::Object::StoreFromKeyed) [decentraland-node]
13: v8::internal::JSObject::AddProperty(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [decentraland-node]
14: v8::internal::JsonStringifier::ApplyReplacerFunction(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [decentraland-node]
15: v8::internal::JsonStringifier::Result v8::internal::JsonStringifier::Serialize_<false>(v8::internal::Handle<v8::internal::Object>, bool, v8::internal::Handle<v8::internal::Object>) [decentraland-node]
16: v8::internal::JsonStringifier::Stringify(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [decentraland-node]
17: v8::internal::Builtin_JsonStringify(int, v8::internal::Object**, v8::internal::Isolate*) [decentraland-node]
18: 0x347968f2b17d
./bin/start: line 31: 1811 Aborted (core dumped) ./bin/decentraland-node --fast --loglevel=info --port=2301 --httpport=8301 --contentport=9301 --prefix="data" --n=testnet --apikey=$RPC_API_KEY --startminer=$START_MINER
berry@diamond:~/sft/land-bronze-new$
I grepped my debug.log and there is not a single occurrence of the Number exceeds 2^53-1
error in it.
My system also shows steady increase in memory usage until it hits the physical memory limit, then a sudden drop presumably when it crashes. Think this supports the memory leak possibility.
I am having the same error on Windows. However, running it from the Git Bash environment, it works but crashes for another reason.
Interesting observation @vedi0boy. I confirm running from GitBash on Windows works. What is the error you see there?
I get this error:
C:\Program Files\nodejs\node.exe: src\async-wrap.cc:619: Assertion
(env()->current_async_id()) == (0)' failed.`
and when I access the web control panel, I get these errors that pop up in the git terminal:
Executing RPC call getblockchaininfo with params [] Executing RPC call getminerinfo with params [] Executing RPC call getnewaddress with params [] Getting mined tiles
[ERROR] Error trying to execute the rpc call getminerinfoconnect ECONNREFUSED 127.0.0.1:8301 [ERROR] Error trying to execute the rpc call getnewaddressconnect ECONNREFUSED 127.0.0.1:8301 [ERROR] Error trying to execute the rpc call getblockchaininfoconnect ECONNREFUSED 127.0.0.1:8301 [ERROR] Error trying to get the tiles. connect ECONNREFUSED 127.0.0.1:8301
It seems like there is no server starting on the 8301 port to serve the stats and tile data. But other than that, I can access the web control panel and I think that everything else is working, I just can't see my stats. But this is not the issue for this thread.
Have also seen the first one, mentioned in #45.
My Windows node running on Git Bash also shows a steady increase in memory usage like the others.
@thoragio Do you know the last commit where this wasn't happening? (The testnet v2 commit ?)
I believe thoragio had the testnet v2 commit running well for a while. Assuming that's true, this compare shows the changes from that commit to the last I tested that shows the memory leak.
Something in these changes has caused a memory leak.
There are some changes around the miner. Has anyone tested for this without the miner running? @thoragio @CJHackerz @vedi0boy
Running with the miner off still shows the memory leak.
There's a new option --loglevel=info
to the node so tried running without it. Same memory usage pattern.
Notably I see the node reports nothing but refilling peers for a long period while the memory usage rises. It's seemingly not processing blocks at all while this happens.
..
[info] Removed loader peer (104.131.97.112:2301).
[debug] Connecting to 217.63.132.166:2301.
[info] Setting loader peer (217.63.132.166:2301).
[info] Removed loader peer (217.63.132.166:2301).
[debug] Connecting to 174.138.56.173:2301.
[info] Setting loader peer (174.138.56.173:2301).
[debug] Refilling peers (7/8).
[debug] Connecting to 84.61.77.78:2301.
[debug] Refilling peers (5/8).
[debug] Connecting to 108.172.14.147:2301.
[debug] Connecting to 73.61.20.159:2301.
[debug] Connecting to 212.178.48.7:2301.
[debug] Refilling peers (5/8).
[debug] Connecting to 104.236.66.136:2301.
[debug] Connecting to 72.193.116.4:2301.
[debug] Connecting to 45.55.180.162:2301.
[debug] Refilling peers (7/8).
[debug] Connecting to 67.11.251.66:2301.
[info] Removed loader peer (174.138.56.173:2301).
[debug] Connecting to 186.138.98.37:2301.
[info] Setting loader peer (186.138.98.37:2301).
[debug] Refilling peers (6/8).
[debug] Connecting to 186.136.130.242:2301.
[debug] Connecting to 185.35.36.100:2301.
[debug] Refilling peers (7/8).
[debug] Connecting to 216.131.19.58:2301.
[debug] Refilling peers (5/8).
[debug] Connecting to 212.178.50.0:2301.
[debug] Connecting to 78.69.188.118:2301.
[debug] Connecting to 174.138.70.142:2301.
[info] Removed loader peer (186.138.98.37:2301).
[debug] Connecting to 104.236.82.120:2301.
[info] Setting loader peer (104.236.82.120:2301).
[debug] Refilling peers (7/8).
[debug] Connecting to 138.197.88.133:2301.
[info] Removed loader peer (104.236.82.120:2301).
[debug] Connecting to 67.79.169.251:2301.
[info] Setting loader peer (67.79.169.251:2301).
[debug] Refilling peers (6/8).
[debug] Connecting to 73.61.20.159:2301.
[debug] Connecting to 217.63.132.176:2301.
[debug] Refilling peers (6/8).
[debug] Connecting to 46.101.155.74:2301.
[debug] Connecting to 91.61.62.108:2301.
[debug] Refilling peers (7/8).
..
@thoragio Does yours look like this or do you see something different?
Running with web interface off, miner off, and without option --loglevel=info
still shows the memory leak.
@berrycoiled I can't remember the last commit where this was not happening. I did pull through the latest changes shortly after v2 testnet was started and I don't remember having this problem. Then I pulled through the changes from 12 July and that's when I noticed the problem.
I do see similar log entries to yours above, but I don't see my node sitting stuck on them for very long. Maybe a second or two and then more log entries come up after. But as they happen, my memory usage goes up by 1 or 2 megs after a couple of the Refilling peers
events have floated by.
Note the OOM errors below that I have grepped out of the debug.log.
[30946:0715/201046.185209:FATAL:memory_linux.cc(35)] Out of memory.
[30946:0715/201046.185209:FATAL:memory_linux.cc(35)] Out of memory.
[30946:0715/201046.185209:FATAL:memory_linux.cc(35)] Out of memory.
[30330:0715/200929.946828:FATAL:memory_linux.cc(35)] Out of memory.
[30330:0715/200929.946828:FATAL:memory_linux.cc(35)] Out of memory.
[30330:0715/200929.946828:FATAL:memory_linux.cc(35)] Out of memory.
/proc/self/exe: error while loading shared libraries: cannot allocate symbol search list: Cannot allocate memory
/proc/self/exe: error while loading shared libraries: cannot allocate symbol search list: Cannot allocate memory
/proc/self/exe: error while loading shared libraries: cannot allocate symbol search list: Cannot allocate memory
[31005:0715/200952.338976:FATAL:memory_linux.cc(35)] Out of memory.
[31005:0715/201000.771296:FATAL:memory_linux.cc(35)] Out of memory.
[31005:0715/200952.338976:FATAL:memory_linux.cc(35)] Out of memory.
[31005:0715/201000.771296:FATAL:memory_linux.cc(35)] Out of memory.
[31005:0715/200952.338976:FATAL:memory_linux.cc(35)] Out of memory.
[31005:0715/201000.771296:FATAL:memory_linux.cc(35)] Out of memory.
../../third_party/tcmalloc/chromium/src/page_heap_allocator.h:74] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size) 131072 48
../../third_party/tcmalloc/chromium/src/page_heap_allocator.h:74] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size) 131072 48
../../third_party/tcmalloc/chromium/src/page_heap_allocator.h:74] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size) 131072 48
These modules had their versions changed. Tried running with the old versions. Same memory leak:
parse-torrent
from ^5.8.2
to ^5.8.3
.webtorrent-hybrid
from ^1.0.5
to ^1.0.6
.Very strange this bug.
I find a memory_linux.cc
file in Chromium, for some reason in the Android version only:
But possibly it's made it into Electron. That would make this a webtorrent
memory leak.
In their issues they have dealt with several memory leaks: https://github.com/webtorrent/webtorrent/issues?utf8=%E2%9C%93&q=is%3Aissue%20memory%20leak
Just tried the testnet v2 commit that was previously working. It ran out of memory and crashed after less than an hour.
Possibly this is not a code problem at all, but something has changed on the network that none of the code handles well.
Some of the webtorrent
memory leak issues were about large files. If larger than normal content has been published to the network recently, maybe it's just crashing webtorrent.
Possibly direction for a solution: reconfigure webtorrent to limit memory use, or in some other way that prevents running out of memory.
Well, holding study now for more than 2 hours. The memory leak is not manifesting any longer, at least for now.
And I have changed absolutely nothing on my machine since it was last crashing consistently a couple of days ago. I literally just restarted the node (without a reboot even).
Confirmed that disabling torrent acquisition solves this. It synced the whole blockchain to ~20,000 blocks in about a minute, with <100MB memory usage.
It's just getting overloaded trying to download too many torrents at once. Think we can solve this by wrapping torrent acquisition in a queue.
Looked into tweaking something to fix this. WebTorrent is awful and can't be fixed. It spawns multiple Chromium process per torrent. There's apparently no way around this. It's not suitable for use as a persistent seeder.
The solution is to drop WebTorrent and move to IPFS.
Long-term WT is still eating away at memory even though it is relatively stable compared to a few days ago:
At this point I think I will just wait until Iron Age to start mining.
There will be no mining in Iron Age. Enjoy it in the Bronze Age while it lasts!! 🎉
If there is no mining than what is the point? It makes it seem like the devs are just trying to create some fake value in a virtual world in order to make money. I guess there is no point in me trying to mine anymore since my progress will be lost anyway.
Mining served two purposes:
We can hire Ethereum for the first point, it's way better than maintaining our own p2p network and miners. But we still need some way to decide on number 2. Our idea is make land expensive, and raising money for the initial setup of the project. Open to hear feedback on this, thanks!
I saw N.O.D.E.'s video on how you guys will start using the Ethereum network and I think it makes sense. Good luck on the project. I will keep tabs on it.
I'm closing this issue because @berrycoiled has pinpointed this as an issue with WebTorrent, not the node itself.
My node running on Linux (Ubuntu 16.04) has been consistently crashing with this error (see below) for the last couple of days. I'm running the latest master from 12 July.
Let me know if there is anything else I can provide or try in order to look into this issue further. I get about an hour or two before the node crashes, and I am running a 2gb RAM, DO droplet (natively, not with Docker).