Open 0racle opened 6 years ago
been around 2Gb
IIRC closer to 1.2GB on 64-bit systems and that's for building Rakudo. Your output suggests it's dying while building NQP.
I've just built master/master/master on my 32-bit Debian. During NQP build, I didn't even notice any memory use increases. It was around 742 before the build and only climbed to 899MB during the build, and a portion of that was being used by ZScript for running the build.
Rakudo's stage parse used 824MB of RAM, pushing the total system use to 1609MB, which is nowhere close to the 4GB your message shows was attempted to be allocated.
FWIW, there are new test failures on Windows R#2070 where test files die with exit code 5 "Access is denied" if you run too many tests at once. Dunno what sort of access that error refers to tho.
I just built NQP (/usr/bin/time perl Configure.pl --prefix=/home/dan/Source/perl6/install/ --backends=moar --make-install
) and it reported ~170mb used.
@0racle Please could you do a --debug
build of MoarVM, and then after it breaks in the NQP build repeat that step under the debugger and get a backtrace, so we can see what is doing the huge allocation?
Steps:
gdb /usr/tools/perl6/bin/moar
break MVM_panic
r --libpath=src/vm/moar/stage0 src/vm/moar/stage0/nqp.moarvm --bootstrap --module-path=gen/moar/stage1 --setting-path=gen/moar/stage1 --setting=NQPCORE --target=mbc --no-regex-lib --stable-sc=stage1 --output=gen/moar/stage1/nqp.moarvm gen/moar/stage1/NQP.nqp
bt
@jnthn
See gdb
output below, hope I did it right. FYI this is on a virualised Centos 6 box. Centox 6 only has glibc 2.12.
/usr/tools/rakudo/nqp (git)-[09d75a9...] # uname -a
Linux spawn 2.6.32-696.23.1.el6.x86_64 #1 SMP Tue Mar 13 22:44:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
/usr/tools/rakudo/nqp (git)-[09d75a9...] # ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
/usr/tools/rakudo/nqp (git)-[09d75a9...] # gdb /usr/tools/perl6/bin/moar
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/tools/perl6/bin/moar...done.
(gdb) break MVM_panic
Function "MVM_panic" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (MVM_panic) pending.
(gdb) r --libpath=src/vm/moar/stage0 src/vm/moar/stage0/nqp.moarvm --bootstrap --module-path=gen/moar/stage1 --setting-path=gen/moar/stage1 --setting=NQPCORE --target=mbc --no-regex-lib --stable-sc=stage1 --output=gen/moar/stage1/nqp.moarvm gen/moar/stage1/NQP.nqp
Starting program: /usr/tools/perl6/bin/moar --libpath=src/vm/moar/stage0 src/vm/moar/stage0/nqp.moarvm --bootstrap --module-path=gen/moar/stage1 --setting-path=gen/moar/stage1 --setting=NQPCORE --target=mbc --no-regex-lib --stable-sc=stage1 --output=gen/moar/stage1/nqp.moarvm gen/moar/stage1/NQP.nqp
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff643b700 (LWP 55956)]
Breakpoint 1, MVM_panic (exitCode=1,
messageFormat=0x7ffff77874a8 "Memory allocation failed; could not allocate %zu bytes")
at src/core/exceptions.c:821
821 MVM_NO_RETURN void MVM_panic(MVMint32 exitCode, const char *messageFormat, ...) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64
(gdb) bt
#0 MVM_panic (exitCode=1,
messageFormat=0x7ffff77874a8 "Memory allocation failed; could not allocate %zu bytes")
at src/core/exceptions.c:821
#1 0x00007ffff764c21a in MVM_panic_allocation_failed (len=4294967296) at src/core/exceptions.c:835
#2 0x00007ffff767fc4d in MVM_malloc (tc=<value optimized out>, al=0x604f50, bytes=<value optimized out>)
at src/core/alloc.h:5
#3 MVM_fixed_size_alloc (tc=<value optimized out>, al=0x604f50, bytes=<value optimized out>)
at src/core/fixedsizealloc.c:194
#4 0x00007ffff7680076 in MVM_fixed_size_alloc_zeroed (tc=<value optimized out>,
al=<value optimized out>, bytes=4294967296) at src/core/fixedsizealloc.c:201
#5 0x00007ffff76aae86 in HASH_EXPAND_BUCKETS (tc=0x603a30, st=<value optimized out>, root=0x5209cf0,
data=0x5209d08, key_obj=0x311def0, value=<value optimized out>, kind=8) at src/strings/uthash.h:564
#6 HASH_ADD_TO_BKT (tc=0x603a30, st=<value optimized out>, root=0x5209cf0, data=0x5209d08,
key_obj=0x311def0, value=<value optimized out>, kind=8) at src/strings/uthash.h:620
#7 bind_key (tc=0x603a30, st=<value optimized out>, root=0x5209cf0, data=0x5209d08, key_obj=0x311def0,
value=<value optimized out>, kind=8) at src/6model/reprs/MVMHash.c:105
#8 0x00007ffff76a41c9 in MVM_repr_bind_key_o (tc=<value optimized out>, obj=<value optimized out>,
key=<value optimized out>, val=<value optimized out>) at src/6model/reprconv.c:553
#9 0x00007ffff76da25b in get_string_heap_index (tc=0x603a30, ws=0x524a1a0, strval=0x311def0)
at src/mast/compiler.c:302
#10 0x00007ffff76ddb60 in compile_frame (tc=<value optimized out>, node=0x3f1be30,
types=<value optimized out>, size=0x7fffffffdb0c) at src/mast/compiler.c:1201
#11 MVM_mast_compile (tc=<value optimized out>, node=0x3f1be30, types=<value optimized out>,
size=0x7fffffffdb0c) at src/mast/compiler.c:1646
#12 0x00007ffff76de960 in MVM_mast_to_file (tc=0x603a30, mast=0x3f1be30, types=<value optimized out>,
filename=0x10b4970) at src/mast/driver.c:75
#13 0x00007ffff765b661 in MVM_interp_run (tc=0x603a30, initial_invoke=<value optimized out>,
invoke_data=<value optimized out>) at src/core/interp.c:3177
#14 0x00007ffff7743aaa in MVM_vm_run_file (instance=0x603010, filename=<value optimized out>)
at src/moar.c:413
#15 0x0000000000401223 in main (argc=12, argv=0x7fffffffe068) at src/main.c:299
(gdb)
@0racle Yes, that's exactly it; thank you.
@samcv any ideas? I can't reproduce it, but hopefully the stack trace provides a clue.
I cannot reproduce. I even added check to make sure variables in the hash implementation were not overflowing, all was fine. I then made all the hash variables (other than pointers and the hash value) 16 bits. I didn't get any overflow. I also checked if it tried to allocate in EXPAND_BUCKETS more than UINT_16MAX, and it did not.
So I have no clue what the issue is. Also the SipHash commit made practically no changes to uthash.h. It was just a swap of the hash function.
When I get back to internet on the laptop I will post a patch that the issue reporter can test, to print out some details if it tried to allocate a large amount, as well as has checks to ensure none of the variables overflow.
This issue is very mysterious.
@0racle please apply this patch on MoarVM's git master.
Put it in the MoarVM directory and run: patch -p1 < hash_debug.patch.txt
. This should display some information about the state of some of the hash variables in case it tries to allocate more than 1GB for the hash table (where it appears to crash in your backtrace).
It will also make sure some of the important variables don't overflow (in case something weird happens?).
Also, not in this patch, but in commit e3e1d0d0c I made all types in uthash use explicitly specified sizes, though this likely doesn't have anything to do with the issue since compiling nqp worked fine when I simulated what would happen if all the things marked as "unsigned" were uint16's.
Report back and let me know what output you get. Thank you.
@samcv MoarVM built from latest commit with patch applied...
/usr/tools/rakudo/nqp/MoarVM (git)-[master] # patch -p1 < hash_debug.patch.txt
patching file src/strings/uthash.h
patching file src/strings/uthash_types.h
Trying to build latest NQP, failes with the following output.
/usr/tools/perl6/bin/moar --libpath=src/vm/moar/stage0 src/vm/moar/stage0/nqp.moarvm --bootstrap --module-path=gen/moar/stage1 --setting-path=gen/moar/stage1 \
--setting=NQPCORE --target=mbc --no-regex-lib --stable-sc=stage1 \
--output=gen/moar/stage1/nqp.moarvm gen/moar/stage1/NQP.nqp
Current num buckets 33554432, tbl->num_items 1836, tbl->ineff_expands 0, tbl->noexpand 0
MoarVM oops: Tried to allocate 1073741824 bytes.
at gen/moar/stage2/QAST.nqp:6741 (src/vm/moar/stage0/QAST.moarvm:assemble_to_file)
from gen/moar/stage2/NQPHLL.nqp:443 (src/vm/moar/stage0/NQPHLL.moarvm:mbc)
from gen/moar/stage2/NQPHLL.nqp:1825 (src/vm/moar/stage0/NQPHLL.moarvm:execute_stage)
from gen/moar/stage2/NQPHLL.nqp:1861 (src/vm/moar/stage0/NQPHLL.moarvm:run)
from gen/moar/stage2/NQPHLL.nqp:1864 (src/vm/moar/stage0/NQPHLL.moarvm:)
from gen/moar/stage2/NQPHLL.nqp:1850 (src/vm/moar/stage0/NQPHLL.moarvm:compile)
from gen/moar/stage2/NQPHLL.nqp:1548 (src/vm/moar/stage0/NQPHLL.moarvm:eval)
from gen/moar/stage2/NQPHLL.nqp:1805 (src/vm/moar/stage0/NQPHLL.moarvm:evalfiles)
from gen/moar/stage2/NQPHLL.nqp:1695 (src/vm/moar/stage0/NQPHLL.moarvm:command_eval)
from gen/moar/stage2/NQPHLL.nqp:1654 (src/vm/moar/stage0/NQPHLL.moarvm:command_line)
from gen/moar/stage2/NQP.nqp:4128 (src/vm/moar/stage0/nqp.moarvm:MAIN)
from gen/moar/stage2/NQP.nqp:4123 (src/vm/moar/stage0/nqp.moarvm:<mainline>)
from <unknown>:1 (src/vm/moar/stage0/nqp.moarvm:<main>)
from <unknown>:1 (src/vm/moar/stage0/nqp.moarvm:<entry>)
Ok, so I have some news. The reason I mentioned the version of glibc is I know it's old, and some projects just don't support it (eg. .NET core is one that comes to mind). On a whim, I enabled devtoolset-2 (which I previously had to install to build something else).
scl enable devtoolset-2 $SHELL
Built NQP and it still failed. Rebuilt MoarVM under devtools first, then rebuild NQP... WORKED!!
So, I have a strong suspicion it's either an issue with glibc 2.12... or an incompatibility between something in MoarVM's code base and glibc 2.12. (Alternatively, maybe there's something else devtoolset provides that's working around the issue, but ancient glibc seems like as good a culprit as any.)
From my recollection, devtoolset-2 on Centos (and RedHat?) uses glibc 2.14. Hopefully the output above can help you track down the issue and resolve it... but otherwise, perhaps the Rakudo toolchain may need to recommend a glibc minimum version of 2.14?
Maybe I broke something... After building Rakudo successfully, I attempted to run perl6
and got the following error
Missing or wrong version of dependency 'gen/moar/stage2/QAST.nqp' (from 'src/Perl6/Pod.nqp')
I am going to blow away my rakudo/nqp
directory, rebuild everything from scratch, and report back.
Ok, that was successful. So to reiterate... I successfully build Rakudo by first enabling the devtoolset environment, and then building Rakudo with the --gen
options
>>> /usr/tools/rakudo (git)-[master] # rm -rf nqp
>>> ~ % echo $LD_LIBRARY_PATH
/usr/local/lib
>>> ~ % scl enable devtoolset-2 $SHELL
>>> ~ % echo $LD_LIBRARY_PATH
/opt/rh/devtoolset-2/root/usr/lib64:/opt/rh/devtoolset-2/root/usr/lib:/usr/local/lib
>>> /usr/tools/rakudo (git)-[master] # git pull && perl Configure.pl --prefix $TOOLS/perl6 --gen-moar --gen-nqp --backends=moar && make && make install
... SUCCESS!
While I'm glad to have a work-around... it would be nice if you could figure out what causes this massive memory allocation under gibc 2.12.
P.S. I have also confirmed the same issue affected on another Centos 6 machine running on different hardware.
I will have to try and reproduce this in a VM and see what is going on. Thank you for help so far trying to figure out is going on here.
Hi @samcv, I can confirm that building f466228 natively (ie. without devtoolset), and then building NQP was successful on one of my machines. I presume it will be on the other as well. Looks like you got it!
If possible, we should try to test builds on older libc's so we can catch these types of things early. In the meantime, at least, I can be the test-bed for Centos 6 buids :smiley:
Glad to hear it @0racle. scovit on IRC seems to think it may be caused by a GCC bug since he can also replicate it with https://godbolt.org/ gcc 4.4.7, so I'm not sure what devtoolset does, but it may be a compiler thing.
devtoolset provides new versions of compilers, such as gcc
>>> ~ % gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
>>> ~ % scl enable devtoolset-2 $SHELL
>>> ~ % gcc --version
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
So yes, looks like GCC 4.4.7 was the issue. It's nice to read that 5504b84 was even faster and more elegant than before. Nice work! :+1:
@0racle thanks for getting back to me on more info on the devtoolset configurations. That seems to make it pretty clear it's a GCC 4.4.7 issue and was fixed in later versions. Even better, the new code is even faster than the code I had written before.
So strings in MoarVM are hashed in a 32 bit per grapheme representation. The SipHash implementation functions I wrote will add 64-bits at a time until there are no more 64-bit chunks anymore. Then it runs siphash_finish32 which will take either 0 (no more graphemes) or the last grapheme. This last function I think is the issue was. With the buggy GCC version: the strings "" and "b" would have the same hash value, and strings "hi" and "hii" would have the same hash values as well. My hypothesis is that there were enough short strings, imagine if you filled a hash table with hundreds of single grapheme strings: these would all conflict. And so this would eventually cause the hash table to expand until it ran out of memory.
It is always important to learn from our mistakes. Early on before I even know the entire issue was caused by a problem in the SipHash code, I found a fast cheat: tell the hash implementation to never expand the number of buckets in the case we have more buckets than items (this should NEVER occur, unless the hashing implementation is broken or the hash table is under attack). Do we want to put in code which makes sure that if the hash table implementation is attacked, an attacker can't cause the table to expand forever; that if we have more items than buckets we should not expand?
Also to be thorough: should we display a warning when this happens? Given that it should only occur during an attack or hashing being totally broken, there's some way of reasoning that in the former case we should warn the user if we can detect this. In the latter, do we want a check at startup or tests to run after MoarVM compilation to ensure things are fully functional? We currently don't have any MoarVM specific test suite.
We currently don't have any MoarVM specific test suite.
We don't, though I'd not be opposed to a make test
target and getting Travis to run them. So long as we don't end up considering that an alternative to running the NQP tests and, preferably, the Rakudo ones too after changes, we're good.
Took me a while to track this down. I normally build Rakudo, but NQP builds started failing sometime between Rakudo commits ee9314d (OK) and 980f692 (FAIL).
I further tracked this down rolling back MoarVM, building it, and then building NQP until I found the bisection.
Building NQP 09d75a9 on MoarVM a50a0b1 ==> OK Building NQP 09d75a9 on MoarVM d9a3270 ==> FAILS
Not sure if this is an issue, or whether SipHash means the RAM requirements for building Rakudo (which have historically been around 2Gb) has now increased? Are other people seeing increased memory allocation since that commit?