facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.12k stars 2.98k forks source link

Reconsider jemalloc narenas tuning #7515

Open tstarling opened 7 years ago

tstarling commented 7 years ago

Since 7798145279de60f285173d9a2fd4ca9025b32db8 (Oct 2012), HHVM overrides jemalloc's number of arenas (narenas), setting it to 1. The default is 4 times the number of CPUs. In WMF bug T151702 we are considering an overload incident which involved requests doing json_decode() calls on large JSON blobs with high concurrency. The json-c library called malloc() very frequently. Contention on one of the bin locks within the single jemalloc arena limited CPU usage to about 50%. Benchmarking suggests that the default would have worked just fine.

HHVM narenas tuning

For a less demanding workload, I'm sure that setting it to 1 provides a small benefit in memory usage, and perhaps a CPU usage benefit would be detectable with careful testing. But I think this kind of workload-dependent tuning is best done in site configuration. The default should have reasonable performance over a wide variety of workloads, rather than pathologically limiting concurrency.

We are using HHVM 3.12.7. The benchmark data was collected on a server with 40 CPUs and 64GB of memory.

paulbiss commented 7 years ago

cc @jasone

jazzdan commented 7 years ago

This happened to us a year ago, and we brought it up with @jasone via email. His explanation at the time:

At Facebook we configured jemalloc with only one automatic arena starting a couple of years ago because for our workloads at the time, we measured no significant decrease in throughput, but a significant decrease in memory usage thanks to fragmentation reduction. In general I think this is a dangerous default because of the potential for issues like the one you (Etsy) hit, and as it happens we are currently running experiments to determine whether narenas:1 is still appropriate for Facebook workloads.

To which @jwatzman replied:

Unless Jason thinks otherwise... let's wait a couple weeks, for the lockdown experiments on this to conclude; if we end up using a value more reasonable outside of FB than just "1", we may not want to add a knob for this, since it's so so specialized and we already have too many knobs.

Perhaps those experiments illuminated a reason for keeping it set to 1?

jasone commented 7 years ago

My fuzzy recollection is that increasing the number of automatic arenas had little impact on Facebook workload throughput. For Facebook, narenas:1 works well, but IMO it's not ideal as a generic default.