facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.13k stars 2.99k forks source link

HHVM segfaults on startup on systems with <= 8GB RAM. Much smaller limits (eg 128MB) are fine as long as machine physically has > 8GB #8796

Closed raxod502 closed 3 years ago

raxod502 commented 3 years ago

Describe the bug In some environments, HHVM segfaults on startup.

Standalone code, or other way to reproduce the problem Unfortunately, this problem does not occur for me when running (in Docker) locally, it only occurs (in the same Docker image) on CircleCI, which may point to something kernel-related.

Expected behavior

$ hhvm
set_mempolicy: Operation not permitted
Nothing to do. Either pass a hack file to run, or use -m server

Actual behavior

$ hhvm
set_mempolicy: Operation not permitted
Segmentation fault (core dumped)

$ gdb hhvm
GNU gdb (Ubuntu 9.2-0ubuntu2) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hhvm...
Reading symbols from /usr/lib/debug/.build-id/c2/76c4d29ab1cbd4be2ceda99ea348432de43980.debug...
(gdb) r
Starting program: /usr/bin/hhvm 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
set_mempolicy: Operation not permitted

Program received signal SIGSEGV, Segmentation fault.
0x0000564ade3bd2dd in std::pair<HPHP::LowPtr<HPHP::StringData const>, int>::operator= (__p=..., this=0xffffffffffffffff) at /usr/include/c++/10/bits/stl_pair.h:390
390           operator=(typename conditional<
(gdb) bt
#0  0x0000564ade3bd2dd in std::pair<HPHP::LowPtr<HPHP::StringData const>, int>::operator= (__p=..., this=0xffffffffffffffff) at /usr/include/c++/10/bits/stl_pair.h:390
#1  HPHP::arrprov::(anonymous namespace)::getTagID (tag={...}) at ./hphp/runtime/base/array-provenance.cpp:108
#2  0x0000564ade3bd3a7 in HPHP::arrprov::Tag::Tag (this=0x7ffc169c59d0, kind=HPHP::arrprov::Tag::Kind::RuntimeLocation, name=<optimized out>, line=<optimized out>)
    at /usr/include/c++/10/bits/move.h:76
#3  0x0000564ade558e11 in HPHP::arrprov::Tag::RuntimeLocation (filename=<optimized out>) at ./hphp/runtime/base/array-provenance.h:90
#4  operator() (__closure=<optimized out>) at ./hphp/runtime/base/runtime-option.cpp:1407
#5  HPHP::RuntimeOption::Load (ini=..., config=..., iniClis=std::vector of length 0, capacity 0, hdfClis=std::vector of length 0, capacity 0, messages=messages@entry=0x7ffc169c5de0, cmd="")
    at ./hphp/runtime/base/runtime-option.cpp:1407
#6  0x0000564ade510821 in HPHP::execute_program_impl (argc=<optimized out>, argv=<optimized out>) at ./hphp/runtime/base/program-functions.cpp:1728
#7  0x0000564ade513a65 in HPHP::execute_program (argc=1, argv=0x7ffc169c7b98) at ./hphp/runtime/base/program-functions.cpp:1288
#8  0x0000564ade12d4a4 in main (argc=1, argv=0x7ffc169c7b98) at ./hphp/hhvm/main.cpp:101
(gdb)

Environment

$ hh_client --version hackc-c806e3b8fcef4fbf7135a864e4dd14b628934684-4.92.0


**Additional context**
It's working locally for me with kernel

Linux runtime 5.8.0-7630-generic #32~1607010078~20.04~383a644-Ubuntu SMP Thu Dec 3 19:14:47 UTC 2 x86_64 x86_64 x86_64 GNU/Linux


while it's not working on CircleCI with kernel

Linux runtime 5.4.0-1021-gcp #21-Ubuntu SMP Fri Jul 10 06:53:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

raxod502 commented 3 years ago

Hey, I found out that I can install hhvm-dbg from dl.hhvm.com to get debugging symbols. Updated the stack trace above.

vikash-itspe commented 3 years ago

After installing Latest version of hhvm 4.92 . it is giving an error segmentation fault (core dump). Please fix this urgently . Code base is in hack language. OS : ubuntu 18.04

raxod502 commented 3 years ago
warning: Error disabling address space randomization: Operation not permitted

... I don't suppose HHVM tries to disable address space randomization or do some other such low-level thing which CircleCI disallows at the kernel level? That would explain why it's working locally, but not on CI.

(Yes, I know that it's GDB here which is trying to disable ASLR, not HHVM. I was just inspired to suggest a possible cause based on this message.)

namanitspe commented 3 years ago

@raxod502 Any suggestions for finding a temporary solution to this issue? Posting a gdb run dump below for the same.

root@ip-172-31-26-133:/home/ubuntu# hhvm --modules
Segmentation fault (core dumped)

root@ip-172-31-26-133:/home/ubuntu# gdb hhvm
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from hhvm...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/hhvm 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555562fa93d in ?? ()

(gdb) r -m server -p 8080
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/hhvm -m server -p 8080
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555562fa93d in ?? ()
lexidor commented 3 years ago

Trying to figure out whether this issue is identical to the one I am having when running Hack code on hhvm 4.87 or above in Jenkins or travis. Also a segfault when invoking any script with hhvm. Does this issue fail to manifest under 4.86?

raxod502 commented 3 years ago

@raxod502 Any suggestions for finding a temporary solution to this issue?

Unfortunately, I don't know of any yet. One idea for collecting more debugging information would be to strace hhvm and see what system calls are invoked around the time of the segfault. I haven't the faintest idea what the problem could be related to, however.

vikash-itspe commented 3 years ago

@raxod502 Aws machine : 5.4.0-1029-aws ~18.04.1-Ubuntu SMP Tue Oct 20 11:09:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux system HVM domU /0/401 processor Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz /0/1000 memory 1GiB System Memory /0/1000/0 memory 1GiB DIMM RAM

Strace HipHop VM 4.92.0 (rel)

strace hhvm segmentation fault log snippet

mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffffffffff} ---
 +++ killed by SIGSEGV (core dumped) +++
 Segmentation fault (core dumped)

But ideally it should proceed like below

mmap(NULL, 10737418240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f62fa300000
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f62f9b00000
munmap(0x7f62f9b00000, 8388608)         = 0
mmap(NULL, 10481664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f62f9901000
munmap(0x7f62f9901000, 1044480)         = 0
munmap(0x7f62fa200000, 1048576)         = 0
uname({sysname="Linux", nodename="ip-172-31-44-90", ...}) = 0
readlink("/proc/self/exe", "/opt/hhvm/4.84.0/bin/hhvm", 4096) = 25
access("/opt/hhvm/4.84.0/bin/hh_single_compile", X_OK) = 0
readlink("/proc/self/exe", "/opt/hhvm/4.84.0/bin/hhvm", 4096) = 25
openat(AT_FDCWD, "/opt/hhvm/4.84.0/bin/hhvm", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0755, st_size=69657392, ...}) = 0
mmap(NULL, 69657392, PROT_READ, MAP_SHARED, 5, 0) = 0x7f62f5791000
raxod502 commented 3 years ago

Huh, that looks like a failure in initial memory allocation before even jumping into HHVM code, perhaps the binary is compiled with ASLR disabled in the ELF metadata (readelf -a) and thus cannot be run in an ASLR-enforced environment? Or something like that. It says ENOMEM but who knows, that could also be because it's trying to allocate memory in a region that's disallowed because the OS forces ASLR and puts the process heap in a different place than the executable expects. Something something RLIMIT_DATA, according to man 2 mmap.

fredemmott commented 3 years ago

Also reproable with hhvm/user-documentation:HHVM-4.93-2021-01-22-0cb6f6a on EC2, but not on my mac.

Currently problems appear to happen on:

Does not happen on:

My next steps:

fredemmott commented 3 years ago

On MacOS at least, limiting the memory with the -m option to docker run doesn't reproduce the issue, so more likely dockerd settings/version than ram

fredemmott commented 3 years ago

Fails with segfault on t2.micro in us-west2 with segfault on both Amazon Linux 2018.03 (AMI amzn-ami-hvm-2018.03.0.20190611-x86_64-ebs) and Ubuntu 20.10 (AMI ami-0b227db5ccaf77e94).

Both are using Docker 19.03.13 build 4484c46

fredemmott commented 3 years ago

Issue does not reproduce on a t2.2xlarge running the same ubuntu 20.10 AMI, even with --memory 128m --memory-swap 0

With a lower limit than that, it tends to OOM rather than sigsegv. This is fun.

fredemmott commented 3 years ago
# docker run --rm hhvm/hhvm:4.87-latest hhvm /dev/null; echo $?
0
# docker run --rm hhvm/hhvm:4.88-latest hhvm /dev/null; echo $?
139
# docker run --rm hhvm/hhvm:2020.12.12 hhvm /dev/null; echo $?
0
# docker run --rm hhvm/hhvm:2020.12.13 hhvm /dev/null; echo $?
139
% git log --oneline nightly-2020.12.12..nightly-2020.12.13
4347433a64 (tag: nightly-2020.12.13) Disable more watchman tests under retranslate-all
66559eb716 Elaborate contexts on a method just like fun/lambda
8fe2012012 Fix is/as/reified generics test folder names
3206afb939 Kill mt_rand hhbbc optimization
f90e2edc89 Use mmap to reserve the arrprov slab

f90e2edc89 (D25515629) both sounds relevant, and matches the strace

fredemmott commented 3 years ago

It also segfaults outside of docker on the same machine

fredemmott commented 3 years ago

Edit: ignore this, fails even on known good versions

On a machine with lots of RAM:

# (ulimit -v $((1024 * 1024 * 8)); hhvm /dev/null; echo $?)
139
# (ulimit -v $((1024 * 1024 * 9)); hhvm /dev/null; echo $?)
0
fredemmott commented 3 years ago

The always_assert() doesn't trigger because it's returning MAP_FAILED, which isn't nullptr

fredemmott commented 3 years ago

FB T83478260

fredemmott commented 3 years ago

It looks like we should be able to get a hotfixable-patch fairly quickly.

In the mean time, if you have control over the environment, use machines with > 8GB RAM (it's fine to limit it to a smaller amount though)

vikash-itspe commented 3 years ago

Also reproable with hhvm/user-documentation:HHVM-4.93-2021-01-22-0cb6f6a on EC2, but not on my mac.

Currently problems appear to happen on:

  • CircleCI
  • TravisCI
  • AWS ElasticBeanstalk-managed t2.micro instances (@vikash-itspe , what type is yours?)

Does not happen on:

  • MacOS with plenty of RAM
  • ???

My next steps:

  • test on a non-EB EC2 t2.micro instance
  • test on an EC2 instance with an obscenely large amount of RAM but otherwise identical

So I did tested on aws t2.micro and t2.small with 1 and 2 GiB RAM respectively with segmentation fault result with HHVM version 4.92 vanilla installation with Ubuntu 18.04 server x86_64 operating system.

vikash-itspe commented 3 years ago

It looks like we should be able to get a hotfixable-patch fairly quickly.

In the mean time, if you have control over the environment, use machines with > 8GB RAM (it's fine to limit it to a smaller amount though)

Sure. will wait for the fixes with repository releases for Ubuntu machines.

@fredemmott Thank you so much for your quick response and exploration.

fredemmott commented 3 years ago

For reference, trace to the mmap() rather than the segfault:

Breakpoint 1, HPHP::arrprov::(anonymous namespace)::getRawTagStorageArray ()
    at ./hphp/runtime/base/array-provenance.cpp:85
85  ./hphp/runtime/base/array-provenance.cpp: No such file or directory.
(gdb) bt
#0  HPHP::arrprov::(anonymous namespace)::getRawTagStorageArray () at ./hphp/runtime/base/array-provenance.cpp:85
#1  0x00005555562f756c in HPHP::arrprov::(anonymous namespace)::getTagID (tag={...})
    at ./hphp/runtime/base/array-provenance.cpp:108
#2  0x00005555562f7649 in HPHP::arrprov::Tag::Tag (this=0x7fffffffd5b0,
    kind=HPHP::arrprov::Tag::Kind::RuntimeLocation, name=<optimized out>, line=<optimized out>)
    at /usr/include/c++/9/bits/move.h:74
#3  0x00005555564b0655 in HPHP::arrprov::Tag::RuntimeLocation (filename=<optimized out>)
    at ./hphp/runtime/base/array-provenance.h:90
#4  HPHP::RuntimeOption::<lambda()>::operator() (__closure=<optimized out>)
    at ./hphp/runtime/base/runtime-option.cpp:1407
#5  HPHP::RuntimeOption::Load (ini=..., config=..., iniClis=..., hdfClis=..., messages=<optimized out>,
    messages@entry=0x7fffffffd970, cmd=...) at ./hphp/runtime/base/runtime-option.cpp:1407
#6  0x0000555556462367 in HPHP::execute_program_impl (argc=<optimized out>, argv=<optimized out>)
    at /usr/include/c++/9/bits/basic_string.h:936
#7  0x0000555556464bf5 in HPHP::execute_program (argc=1, argv=0x7fffffffe6f8)
    at ./hphp/runtime/base/program-functions.cpp:1288
#8  0x0000555556017cb5 in main (argc=1, argv=0x7fffffffe6f8) at ./hphp/hhvm/main.cpp:101
(gdb)
fredemmott commented 3 years ago

We're likely to land a fix for 4.94 which isn't suitable for backporting.

For prior versions, I'll probably apply this on Monday; it is fine unless you've explicitly enabled some options related to tracking where PHP arrays (as opposed to vecs/dicts) are produced.

diff --git a/hphp/runtime/base/array-provenance.cpp b/hphp/runtime/base/array-provenance.cpp
index 46d3803dfd..ace2e395e2 100644
--- a/hphp/runtime/base/array-provenance.cpp
+++ b/hphp/runtime/base/array-provenance.cpp
@@ -57,7 +57,17 @@ using TagStorage = std::pair<LowPtr<const StringData>, int32_t>;

 static constexpr TagID kKindBits = 3;
 static constexpr TagID kKindMask = 0x7;
-static constexpr size_t kMaxTagID = (1 << (8 * sizeof(TagID) - kKindBits)) - 1;
+static constexpr size_t kMaxTagID = ((1 << (8 * sizeof(TagID) - kKindBits)) - 1)
+  // Arbitrary reduction to reduce the required memory from 8GB to 8MB to work
+  // around https://github.com/facebook/hhvm/issues/8796
+  //
+  // This limits us to 524288 arrays - but if array provenance is disabled, we
+  // only need tags for arrays created when reading configs, which will be much
+  // less than that.
+  //
+  // This is a tradeoff that means that even if you have enough RAM,
+  // ArrayProvenance can not be safely enabled for large projects.
+  / 1024;

 struct TagHashCompare {
   bool equal(TagStorage a, TagStorage b) const {
fredemmott commented 3 years ago

This should be fixed in 4.94.0; building .1 versions of other supported/affected, but need more testing to be sure this is resolved.

fredemmott commented 3 years ago

Fixed in .1 release of 4.88.1 -> 4.93.1