facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.12k stars 2.98k forks source link

3.4.0 (official hhvm-3.4.0~trusty package) eats all memory+swap #4268

Closed tat closed 9 years ago

tat commented 9 years ago

I upgraded my aws instances (c3.large) to 3.4.0 (official packages got from http://dl.hhvm.com/ubuntu) and all of them get killed by oom-killer after eating all RAM and swap in about 5 minutes (getting about 300 requests per minute).

Is there anything I can check to track down the issue?

My server.ini: pid = /var/run/hhvm/pid hhvm.server.port = 9000 hhvm.server.type = fastcgi hhvm.server.default_document = index.php hhvm.log.use_log_file = true hhvm.log.file = /var/log/hhvm/error.log hhvm.repo.central.path = /var/run/hhvm/hhvm.hhbc hhvm.resource_limit.max_socket = 10000 hhvm.log.header = true

Thanks, stefano

mklooss commented 9 years ago

can confirm same senario here, but also on HHVM 3.3. we have to restart the hhvm process every 6 hours to keep the server online were are using an Dedicated Server auswahl_010

tat commented 9 years ago

In my case hhvm eats RAM+swap (about 5gigs in total) in about 5/7 minutes.

mklooss commented 9 years ago

yesterday we had the same on 64 GB RAM and 8 GB SWAP in about 6 hours :/

jwatzman commented 9 years ago

@tat, this is an increase from 3.3 to 3.4? That's interesting. Can you get a heap profile for us? The process is unfortunately somewhat involved.

cc @paulbiss

@mklooss, what you're experiencing is unfortunately somewhat expected, and is a long-term issue we've been slowly looking into. It's not indicative of server instability, we just haven't optimized for a super-long-running server very much, since FB pushes twice a day. (Though 6 hours is still quite short.)

fredemmott commented 9 years ago

The admin server speaks FastCGI now, not HTTP - you'll also need to configure your webserver to give you access to it.

tat commented 9 years ago

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command: Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

Here's the jmalloc-stats output I captured from the admin interface, http://pastebin.com/vBvfPiP5

btw @jwatzman 3.3 is working fine for me, RAM usage is stable at about 350MB; it has been running for days without restarts.

mklooss commented 9 years ago

jemalloc Stats: https://gist.github.com/mklooss/8091e48c4551f40d05c8 currently the HHVM Process eats ~10 GB RAM, process runs ~ 2 hours

frankh commented 9 years ago

I'm getting the same problem running a large wordpress site on HHVM. Memory usage starts at ~450mb and climbs to 1.2mb before restarting (not 100% if OOM killed or crashes yet) every ~2 hours

This is HHVM 3.4.0 on ubuntu/trusty

jwatzman commented 9 years ago

I just cherry-picked a memory leak fix into the 3.4 branch -- can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release. The issue is that if you are passing invalid arguments to some builtin functions, such that the builtin raises a warning, we leak a small amount of memory each time -- and it looks new in 3.4. If your PHP app generates a lot of warnings from builtins, then this could easily be your bug :)

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command: Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

I don't, sorry -- @fredemmott, @paulbiss, can either of you advise better?

jwatzman commented 9 years ago

can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release.

I went ahead and built a deb for trusty with this patch: http://dl.hhvm.com/ubuntu/hhvm_3.4.1-devtest~trusty_amd64.deb You can manually install that so you don't have to build HHVM yourself; let me know if it works better.

denji commented 9 years ago

configure:

./configure -DENABLE_SSP=ON -DDEBUG_MEMORY_LEAK=ON -DDEBUG_APC_LEAK=ON

  -DDEBUG_APC_LEAK=ON|OFF : Allow easier debugging of apc leaks : Default: OFF
  -DDEBUG_MEMORY_LEAK=ON|OFF : Allow easier debugging of memory leaks : Default: OFF
  -DENABLE_SSP=ON|OFF : Enabled GCC/LLVM stack-smashing protection : Default: OFF
levixie commented 9 years ago

@jwatzman which change you cherry-pick? I only see some doc update Thanks

jwatzman commented 9 years ago

https://github.com/facebook/hhvm/commit/edf53c1f7b9e1e2195a4d012b1792dd4d087137c is the relevant cherry-pick. It does look like only a doc update, but AIUI we have a script that parses that file (in particular, lines of the form of the one changed) to generate a bunch of data about opcode semantics, and the change is thus relevant. It confused me as well until it was explained to me this morning :-P

levixie commented 9 years ago

Thank you! We are building hhvm ourselves because we need some specific version of lib. I will pick the change and try it out to see how it goes

frankh commented 9 years ago

Thanks for the patch and build, I'm trying it out now but unfortunately it looks like it's still leaking memory.

There are no warning/errors in my hhvm log so it doesn't look like this was the cause of the leak for me.

staabm commented 9 years ago

maybe you are using create_function ? it seems this one is leaky, too - https://github.com/facebook/hhvm/issues/4250

paulbiss commented 9 years ago

@staabm: that's been leaky for awhile, we're looking for a leak that was recently introduced

jwatzman commented 9 years ago

Spent most of the morning looking at this. I wasn't able to reproduce it with the "representative WordPress" install from https://github.com/hhvm/oss-performance, unfortunately. However, I was able to reproduce the heap profiling failure, and can help you get us a heap profile. It's a little messy.

  server {
    listen 8091 default_server;
    access_log            /dev/shm/hhvm-nginxCnchqi/admin-access.log main;
    client_body_temp_path /dev/shm/hhvm-nginxCnchqi/admin-client_temp;
    proxy_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-proxy_temp;
    fastcgi_temp_path     /dev/shm/hhvm-nginxCnchqi/admin-fastcgi_temp;
    uwsgi_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-uwsgi_temp;
    scgi_temp_path        /dev/shm/hhvm-nginxCnchqi/admin-scgi_temp;

    location / {
      fastcgi_pass 127.0.0.1:8093;
      include fastcgi_params;
    }
  }
SiebelsTim commented 9 years ago

@jwatzman Put this in the wiki or somewhere! :+1:

jwatzman commented 9 years ago

Yeah, good idea, will do if this ends up producing useful results :)

jwatzman commented 9 years ago

Have any of you that are experiencing this been able to get any more info? Just confirming that the 3.4.1-devtest deb linked above does or does not help would be useful -- and if it doesn't help, a heap dump as above would be even more useful. This is going to eventually hit human timeout which would be unfortunate, since it seems to be a real issue -- but since we can't repro it, we need more info to track it down :(

liayn commented 9 years ago

I'll install the devtest thing now on the live-server now. lets hope

liayn commented 9 years ago

hm, apt-get keeps nagging me tell me a newer version is available... how can I avoid that?

jwatzman commented 9 years ago

You can directly download the deb and then sudo dpkg --install path/to/deb.

liayn commented 9 years ago

that's what I did. It replaced the installed hhvm, but now apt-get reports that updates are available and that triggers reporting systems and that triggers mails....

jwatzman commented 9 years ago

Can you just silence that for a little while? The package is deliberately built out-of-band, since it's unclear if it will help. (Though it's signed with the same GPG key as the official ones so you can tel it does come from us.) I'm not sure what reporting system you are using to tell you how to shut it up; you may try just commenting out the HHVM repo from /etc/apt/sources.list or /etc/apt/sources.list.d/ wherever it is.

pjv commented 9 years ago

I can't risk messing around with my production environment - i am getting paid by my clients to provide a performant, stable platform for their sites - but would a list of wordpress plugins be useful to you guys?

my hosting environment is based on easyengine which is a tool that can spin up a wordpress stack with baseline best practices on a fresh ubuntu with a couple commands in a couple minutes. you could provision a VPS in 10 or 15 minutes that duplicates in most respects the environment that I and many others use for serving wordpress with nginx / mysql. if you are not seeing memory leaks under load with your representative install, it's probably one or another plugin that is causing it. i can give you a list of the plugins in use on my server.

liayn commented 9 years ago

First report after 16 hours: No troubles so far!

pjv commented 9 years ago

First report after 16 hours: No troubles so far!

@liayn : are you saying that the memory use by HHVM under load is stable with the build that @jwatzman provided?

liayn commented 9 years ago

@pjv At least the server didn't kill the processes due to memory shortage till now, which was happening regularly before. But we will keep watching this for at least a full week. Lets see how things develop. I'll report back here.

jwatzman commented 9 years ago

Sounds promising, let me know how it continues to go!

jwatzman commented 9 years ago

And yeah @pjv that list would be useful, as well as any info you can get on how to set up the config, just in case -- but I'm hopeful I won't need to try to set that up and @liayn will just confirm that things are good :)

pjv commented 9 years ago

@jwatzman agree with you that i hope that @liayn's server just keeps working and you can ignore this, but here is the list of the plugins in use on several different sites i am managing on my server that are all running on HHVM. Some of them are premium ($) plugins, but most are not.

I have some work-arounds coded into my nginx conf for some of these where I know that HHVM chokes on them - for example, the woocommerce checkout page and some functions in mailpoet (both of which involve establishing SSL connections which i gather is a known issue for HHVM). For those requests I send it to php-fpm instead of HHVM.

Here's the list:

Akismet - https://wordpress.org/plugins/akismet/ Anti Feed-Scraper Message - https://wordpress.org/plugins/anti-feed-scraper-message/ Better User Profile Fields - https://wordpress.org/plugins/better-user-profile-fields/ Biographical Info Paragraphed - https://wordpress.org/plugins/biographical-info-paragraphed/ Cloudflare - https://wordpress.org/plugins/cloudflare/ Comment Image - https://wordpress.org/plugins/comment-image/ Comment Image Embedder - https://wordpress.org/plugins/wordpress-comment-images/ Configure SMTP - https://wordpress.org/plugins/configure-smtp/ Contact Form 7 - https://wordpress.org/plugins/contact-form-7/ Display Widgets - https://wordpress.org/plugins/display-widgets/ Easy Facebook Like Box - https://wordpress.org/plugins/easy-facebook-likebox/ Easy Google Fonts - https://wordpress.org/plugins/easy-google-fonts/ Featured Authors Widget - https://wordpress.org/plugins/featured-authors-widget/ FT Signature Manager - https://wordpress.org/plugins/ft-signature-manager/ Google XML Sitemaps - https://wordpress.org/plugins/google-sitemap-generator/ IgniteWoo Updater - http://ignitewoo.com/ Jetpack - https://wordpress.org/plugins/jetpack/ jQuery Colorbox - https://wordpress.org/plugins/jquery-colorbox/ Limit Login Attempts - https://wordpress.org/plugins/limit-login-attempts/ MailPoet Newsletters - https://wordpress.org/plugins/wysija-newsletters/ MailPoet Newsletters Premium - http://www.mailpoet.com/ MailPoet WooCommerce Add-on - https://wordpress.org/plugins/mailpoet-woocommerce-add-on/ mPress Custom Feed Excerpts - https://wordpress.org/plugins/mpress-custom-feed-excerpts/ Multiple Packages for WooCommerce - https://wordpress.org/plugins/multiple-packages-for-woocommerce/ Nginx Helper - http://rtcamp.com/nginx-helper/ Olimometer - https://wordpress.org/plugins/olimometer/ PG Slide Out Tabs - http://plugingreat.com/slide-out-tabs/ Prioritize Hooks - https://wordpress.org/plugins/prioritize-hooks/ Revolution Slider - http://www.themepunch.com/codecanyon/revolution_wp/ Quick Adsense - https://wordpress.org/plugins/quick-adsense/ Quotes Collection - https://wordpress.org/plugins/quotes-collection/ Rating Widget - https://wordpress.org/plugins/rating-widget/ Search and Replace - https://wordpress.org/plugins/search-and-replace/ SI Captcha Anti-spam - https://wordpress.org/plugins/si-captcha-for-wordpress/ Thin Out Revisions - https://wordpress.org/plugins/thin-out-revisions/ TinyMCE Advanced - https://wordpress.org/plugins/tinymce-advanced/ Tippy - https://wordpress.org/plugins/tippy/ WooCommerce - excelling eCommerce - https://wordpress.org/plugins/woocommerce/ WooCommerce - ShipStation Integration - http://www.woothemes.com/products/shipstation-integration/ WooCommerce - Store Exporterhttps://wordpress.org/plugins/woocommerce-exporter/ WooCommerce Advanced Free Shipping - https://wordpress.org/plugins/woocommerce-advanced-free-shipping/ WooCommerce Customizer - https://wordpress.org/plugins/woocommerce-customizer/ WooCommerce FedEx Shipping - http://woothemes.com/woocommerce WooCommerce Gift Certificates Pro - http://ignitewoo.com/ WooCommerce PayPal Pro (Classic and PayFlow Editions) Gateway - http://woothemes.com/woocommerce WooCommerce Print Invoice & Delivery Note - https://wordpress.org/plugins/woocommerce-delivery-notes/ WooCommerce USPS Shipping - http://woothemes.com/ WooCommerce WishLists - http://woothemes.com/ WooThemes Helper - http://woothemes.com/ WP fail2ban - https://wordpress.org/plugins/wp-fail2ban/ WPLOOK Twitter Follow Button (new) - https://wordpress.org/plugins/wplook-twitter-follow-button-new/ wpMandrill - https://wordpress.org/plugins/wpmandrill/ WP User Avatar - https://wordpress.org/plugins/wp-user-avatar/

liayn commented 9 years ago

For completeness, our list of WP plugins:

So the only overlapping plugin I see is google-sitemap-generator

jwatzman commented 9 years ago

Thanks for the info. Probably won't focus on the google sitemap generator -- it could easily be a bug tickled by several different plugins. But hoping things stay stable, at least as much as they were with 3.3 :)

liayn commented 9 years ago

Ok bad news guys. today 12:45 (UTC+1) hhvm was killed again. :-( So I can't really tell if the dev-version made a slight improvement or not.

I'm very certain that Wordpress triggers lots of E_NOTICE errors (although reporting and logging of these is disabled).

pjv commented 9 years ago

ok, apologies in advance to all the php-heads... i just googled E_NOTICE and everything i read adds up to furthering my loathing of php.

on the other hand, if somehow or another HHVM has an issue with E_NOTICE errors and that is the major source of the memory leak we are seeing, that seems like a pretty specific area to look at.

jwatzman commented 9 years ago

If the notices are about incorrect types being passed to builtin functions, then the patch in the package I posted above will almost certainly fix the problem.

@liayn it sounds like the package I provided at least makes the problem less bad. There are several known memory leaks in HHVM right now, but they're hard to fix. They've been there forever though, so nothing should have gotten worse in 3.4 as far as we know. Did the process at least run as long as it used to with 3.3?

liayn commented 9 years ago

I can't tell you exactly how long it was running with 3.3, since we had it restarting every night due to some problems we experienced. These problems were not traceable as well. The service was still running, but it didn't respond anymore. No log entries, nothing. We had to play around with the various JIT memory settings, but I must really admit that documentation of these settings is simply not existent, which is actually embarrassing for such important settings. It would be really helpful to at least understand what each of these options does and how they influence each other and what values are reasonable (best practice) and what is just insane.

jwatzman commented 9 years ago

I can't tell you exactly how long it was running with 3.3, since we had it restarting every night due to some problems we experienced.

OK, so it sounds like we're back to being no worse than 3.3 was. We realize restating every night isn't a great situation and is something we hope to fix, but we definitely don't want to seriously regress from that in 3.4.

We've got a couple more fixes in the pipe for 3.4.1, I'll tag it and roll packages for it when those land.

We had to play around with the various JIT memory settings, but I must really admit that documentation of these settings is simply not existent

Yep, this is something we'd like to improve. If you want to add your own findings to https://github.com/facebook/hhvm/wiki/INI-Settings it would be appreciated!

pjv commented 9 years ago

We had to play around with the various JIT memory settings, but I must really admit that documentation of these settings is simply not existent

Yep, this is something we'd like to improve. If you want to add your own findings to https://github.com/facebook/hhvm/wiki/INI-Settings it would be appreciated!

Seriously?

Some engineer at FB has to have actually coded these various JIT memory options and ought to be able to burp up at least some kind of minimal orientation on what they do and how they interact and what kind of values might be sane rather than depending on end-users who have no alternative other than literal trial and error or reading through and trying to interpret the source code.

jwatzman commented 9 years ago

Since you mentioned playing with the settings, I hoped that you would be able to contribute both information on them as well as the more valuable perspective on tuning them outside Facebook -- a perspective we don't have. If you don't, that's fine, we'll find someone on our end to write up the info at some point. (I certainly don't know the details well enough to write canonical docs on them!)

liayn commented 9 years ago

"Playing" in our case means: raise the values arbitrarily until hhvm survives 24h without crashing. :-(

denji commented 9 years ago

If the same page of the same is likely partly to break even this problem

yunasc commented 9 years ago

I installed 3.4.1dev.

I still have memleak, albeit it's slower. Before it was about 50-90mb mem leak per request, now it's 15-30mb per request leak.

tat commented 9 years ago

@jwatzman finally I was able to try out your new build and to take the dumps you asked; unfortunately the leak is still there, it took just a few minutes to consume all memory and to get killed by the kernel (3.7GB, this is running on aws).

I took 6 dumps, when I took the sixth hhvm process was consuming more then 90% of all ram, I tried to take the 7th but the kernel killed hhvm before I could take the dump.

The version I tried is the package you build downloaded from http://dl.hhvm.com/ubuntu/hhvm_3.4.1-devtest~trusty_amd64.deb (md5 e4bdf866279c8178378172d3d56e2187 ).

$ hhvm --version HipHop VM 3.4.1-dev (rel) Compiler: heads/HHVM-3.4-0-g7c1919f48c8d432c4987fc93afe6f4367389934c Repo schema: a4f8ff8de012b16bcbb3d2ecb1ebf09917154854 Extension API: 20140829

I'm sending you the dump and prof files via email now.

Thanks.

jwatzman commented 9 years ago

Thanks! We got the jemalloc dumps and the information from them looks potentially quite useful. I'm sending them around internally and will let you know when I hear anything.

swtaarrs commented 9 years ago

@tat We've been staring at the dumps but haven't been able to figure out if it's a leak or just something in the jit that's using a lot of memory. Could you set TRACE=printir:1 and run again? This will create a fairly large file in /tmp/hphp.log containing some debug output from the jit that might be helpful. It will contain enough information for us to reconstruct the PHP code that was running, so make sure you're ok with that before giving us the file.

You can also try using the hhvm.jit_max_region_instrs ini option to control the size of the region we give to the jit. It defaults to 1000, maybe try 100 or 500 and see if that helps?

bertmaher commented 9 years ago

Try setting hhvm.jit_pgo_hot_only=true. We changed the default on that from 3.3 to 3.4, and it may be causing the JIT to use more memory while it's optimizing (none of that memory should leak, but if it spikes at the wrong time it could OOM hhvm).

tat commented 9 years ago

I'll try that asap and let you know.

A thing I've noticed that may be helpful for you is that the amount of ram is increasing as time passes (as expected) but at one point it really starts to grow faster and faster till it gets oom'd. Something like it takes 5 minutes to get to 50% and 1 minute to get to 100%; when that happens the cpu is heavily used as well (cpu usage is low otherwise).