Investigate disk-writes of apc mmap

cargomedia / puppet-packages

UNMAINTAINED. Reusable puppet modules for Debian

MIT License

7 stars 16 forks source link

Investigate disk-writes of apc mmap #763

Closed njam closed 9 years ago

njam commented 9 years ago

Using disk-backed mmapping for acp causes increased disk writes, which leads to a considerable performance impact.

Performance impact documented in: https://github.com/cargomedia/puppet-cargomedia/issues/612 screen shot 2014-11-06 at 19 33 21 (new relic transaction)

I didn't expect setting mmap_file_mask to change anything in the actual behaviour. @ppp0 @kris-lab let's research this! I'm moving this to ready because it has quite an impact.

kris-lab commented 9 years ago

Very interesting. Looks like high-IO is a common problem http://blog.vec.com/2012/05/29/io-problem-solved/

njam commented 9 years ago

So a solution would be to mount /tmp on a memory-backed filesystem (tmpfs). Maybe even mount a apcu-specific directory just for the mmapping?

ppp0 commented 9 years ago

Good discussion about the difference shm vs mmap (http://stackoverflow.com/questions/21311080/linux-shared-memory-shmget-vs-mmap). Take away: as a rule use POSIX-style shmget, for more "power" use SysV-style memory mapping, but in the end (for us as end user) it does not matter.

My take away and suggestion: proceed as in the link provided by @kris-lab (http://blog.vec.com/2012/05/29/io-problem-solved/) and set apc.mmap_file_mask=/apc.shm.XXXXXX

Per default (something I wasn't aware of either), /tmp is not mounted as tmpfs (see /etc/defaults/tmpfs unless the root partition is read-only or full:

[...]
# mount /tmp as a tmpfs.  Defaults to no; set to yes to enable (/tmp
# will be part of the root filesystem if disabled).  /tmp may also be
# configured to be a separate mount in /etc/fstab.
#RAMTMP=no
[...]

I would keep the settings as-is (the defaults). These give /run/shm a size of 20% of the (virtual) memory and use swapping if needed.

# Size limits.  Please see tmpfs(5) for details on how to configure
# tmpfs size limits.
#TMPFS_SIZE=20%VM
#RUN_SIZE=10%
#LOCK_SIZE=5242880 # 5MiB
#SHM_SIZE=
#TMP_SIZE=

# Mount tmpfs on /tmp if there is less than the limit size (in kiB) on
# the root filesystem (overriding RAMTMP).
#TMP_OVERFLOW_LIMIT=1024

On www3 and www4 we have:

root@www3:~$ df -h
Filesystem                                              Size  Used Avail Use% Mounted on
[...]
tmpfs                                                   6.5G     0  6.5G   0% /run/shm
[...]

On www5 3.4G... (20% of 16GB)

I am not completely sure we need the same free space for the map file as we allocate (apc.shm_size=4096M) - What do you guys think/know?

Here another explanation about memory overcommitting: http://stackoverflow.com/questions/4803152/mmap-fails-when-length-is-larger-than-4gb/4803978#4803978

Our settings are overcommit: yes, using heuristics ratio: 50 (Probably not optimal?)

cap> cat /proc/sys/vm/overcommit_memory
 ** [out :: www4 (10.55.40.160)] 0
 ** [out :: www5 (10.55.40.138)] 0
 ** [out :: www3 (10.55.40.166)] 0

cap> cat /proc/sys/vm/overcommit_ratio
 ** [out :: www4 (10.55.40.160)] 50
 ** [out :: www3 (10.55.40.166)] 50
 ** [out :: www5 (10.55.40.138)] 50

njam commented 9 years ago

Why this path /apc.shm.XXXXXX - is this some magic which makes it reside on /run/shm? Or should it rather be /run/shm/apc.shm.XXXXXX?

Regarding the size limitation: Relying on the system's settings for that seems a bit quirky. If/when we change the apcu-max-memory we need to think about changing the system settings too. If some other service is using it we might run into more problems. How about mounting a special directory from within the apcu-module (which has exactly the size we allocate to apcu)?

ppp0 commented 9 years ago

Comment from https://github.com/krakjoe/apcu/blob/aba5bf3019cc7b522fa7627c0762d4a7c8a6ab40/apc_mmap.c#L88 onward:

  /*
     * If the filemask contains .shm we try to do a POSIX-compliant shared memory
     * backed mmap which should avoid synchs on some platforms.  At least on
     * FreeBSD this implies MAP_NOSYNC and on Linux it is equivalent of mmap'ing
     * a file in a mounted shmfs.  For this to work on Linux you need to make sure
     * you actually have shmfs mounted.  Also on Linux, make sure the file_mask you
     * pass in has a leading / and no other /'s.  eg.  /apc.shm.XXXXXX
     * On FreeBSD these are mapped onto the regular filesystem so you can put whatever
     * path you want here.
     */

njam commented 9 years ago

Ok interesting :)

What about the size limitations? Is it worth mounting a separate shmfs? Will this reserve RAM and make it unusable for other processes?

ppp0 commented 9 years ago

To what I understand, if the memory is being used, it won't be usable for other processes (see also: overcommit) - In cases there is shortage, swapping will start... I wouldn't overcommit too much (now: max 50%) - or disable overcommit at all!

We might not have an optimal shm_size / number of segments... May be it is worth splitting the 4GB we now allocate at once into 128 chunks @ 32kB ? There is a kernel vm setting about max segment size shmmax which is now at 33554432 (32768 kB = 32 MB)

Afaiu, if you mount a tmpfs arbitrarily, this will create a Ramdisk and then, mmap will treat it as a file system, not as POSIX shared mem? (This is what I get from apc_sma_api_init https://github.com/krakjoe/apcu/blob/aba5bf3019cc7b522fa7627c0762d4a7c8a6ab40/apc_sma.c#L271 and apc_mmap https://github.com/krakjoe/apcu/blob/aba5bf3019cc7b522fa7627c0762d4a7c8a6ab40/apc_mmap.c#L115)

ppp0 commented 9 years ago

Following the discussion let's revert mmap using the file system to anonymous mmap and reduce the segment size to a value that can cope with the maximum usage we've observed over time

njam commented 9 years ago

2014-11-12: Using anonymous mmapping as of https://github.com/cargomedia/puppet-cargomedia/pull/648 Server time: 2:47 (CST) New Relic time: 9:47 (UTC+1)

njam commented 9 years ago

So far segfaults look okay - one incident after config change, on www3:

cap> with web,search,wowza zgrep 'SIGSEGV' /var/log/php5-fpm/php5-fpm.log* | head -n 5
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log:[09-Nov-2014 19:36:47] WARNING: [pool www] child 12838 exited on signal 11 (SIGSEGV) after 1006.619989 seconds from start
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log:[10-Nov-2014 11:55:09] WARNING: [pool www] child 10991 exited on signal 11 (SIGSEGV) after 4609.901976 seconds from start
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log:[2014-11-11 04:11:38] wooowoo SIGSEGV test
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log:[2014-11-11 04:11:02] wooowoo SIGSEGV test
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log:[2014-11-11 04:11:03] wooowoo SIGSEGV test
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log:[11-Nov-2014 10:33:59] WARNING: [pool www] child 6055 exited on signal 11 (SIGSEGV) after 1648.327924 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log:[11-Nov-2014 21:04:01] WARNING: [pool www] child 1202 exited on signal 11 (SIGSEGV) after 1247.821408 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[04-Nov-2014 05:40:49] WARNING: [pool www] child 19939 exited on signal 11 (SIGSEGV) after 4646.542078 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[07-Nov-2014 05:27:55] WARNING: [pool www] child 28671 exited on signal 11 (SIGSEGV) after 499.111754 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.2.gz:[26-Oct-2014 18:37:41] WARNING: [pool www] child 16849 exited on signal 11 (SIGSEGV) after 1277.708970 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log:[13-Nov-2014 11:48:15] WARNING: [pool www] child 31690 exited on signal 11 (SIGSEGV) after 2948.968636 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[04-Nov-2014 14:07:53] WARNING: [pool www] child 28313 exited on signal 11 (SIGSEGV) after 3131.504725 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.2.gz:[29-Oct-2014 08:19:18] WARNING: [pool www] child 3065 exited on signal 11 (SIGSEGV) after 1063.774189 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.2.gz:[29-Oct-2014 08:19:18] WARNING: [pool www] child 29946 exited on signal 11 (SIGSEGV) after 1668.301395 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.2.gz:[29-Oct-2014 08:19:18] WARNING: [pool www] child 29499 exited on signal 11 (SIGSEGV) after 5352.010290 seconds from start

Response time is steady again - ajax_toggleFriend: screen shot 2014-11-14 at 08 30 42

Disk writes are lower again - www5: screen shot 2014-11-14 at 08 32 29

I will monitor segfaults further over the weekend, and if everything stays as it is now I'll think we can close this.

njam commented 9 years ago

We have individual SIGSEGVS exits on all servers:

cap> with web,search,wowza zgrep 'SIGSEGV' /var/log/php5-fpm/php5-fpm.log* | head -n 5
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[11-Nov-2014 10:33:59] WARNING: [pool www] child 6055 exited on signal 11 (SIGSEGV) after 1648.327924 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[11-Nov-2014 21:04:01] WARNING: [pool www] child 1202 exited on signal 11 (SIGSEGV) after 1247.821408 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 9333 exited on signal 11 (SIGSEGV) after 1849.094657 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 12104 exited on signal 11 (SIGSEGV) after 1434.129498 seconds from start
 ** [out :: www4.example.com (10.55.40.160)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 19017 exited on signal 11 (SIGSEGV) after 300.308251 seconds from start
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log.1:[09-Nov-2014 19:36:47] WARNING: [pool www] child 12838 exited on signal 11 (SIGSEGV) after 1006.619989 seconds from start
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log.1:[10-Nov-2014 11:55:09] WARNING: [pool www] child 10991 exited on signal 11 (SIGSEGV) after 4609.901976 seconds from start
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log.1:[2014-11-11 04:11:38] wooowoo SIGSEGV test
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log.1:[2014-11-11 04:11:02] wooowoo SIGSEGV test
 ** [out :: www5.example.com (10.55.40.138)] /var/log/php5-fpm/php5-fpm.log.1:[2014-11-11 04:11:03] wooowoo SIGSEGV test
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[13-Nov-2014 11:48:15] WARNING: [pool www] child 31690 exited on signal 11 (SIGSEGV) after 2948.968636 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 11637 exited on signal 11 (SIGSEGV) after 1083.931758 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 14372 exited on signal 11 (SIGSEGV) after 842.945317 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 27852 exited on signal 11 (SIGSEGV) after 2907.207408 seconds from start
 ** [out :: www3.example.com (10.55.40.166)] /var/log/php5-fpm/php5-fpm.log.1:[15-Nov-2014 20:37:22] WARNING: [pool www] child 13085 exited on signal 11 (SIGSEGV) after 4593.038602 seconds from start

As we also had this when using mmap_file_mask I think we can close this. @ppp0 @kris-lab please reopen if you disagree.