famzah / popen-noshell

A much faster popen() and system() implementation for Linux
68 stars 13 forks source link

Each popen_noshell process uses 8MB of RAM from parent? #9

Closed heshiming closed 9 years ago

heshiming commented 9 years ago

I'm trying to write something for an OpenWRT based router. Calling popen_noshell is definitely faster than popen, though at an expected cost. I discovered that each process opened this way will use some 8MB of RAM from the parent process. It's not leaking, but it uses that much while the process is open. What would be the reason, and is there any way to lower that?

heshiming commented 9 years ago

I discovered POPEN_NOSHELL_STACK_SIZE, which is set to 8MB exactly. I can see it's for sharing memory for the clone call. But could you explain the logic behind this? Because in my particular case, I'm executing just a ping: ping -c 1 -W1 somewhere.com, I found it okay to shrink down that size to just 1024 bytes.

famzah commented 9 years ago

Hi,

Each process must have a stack: https://en.wikipedia.org/wiki/Call_stack

The man page of clone() says: Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space to clone().

This answers the question "why" we need to allocate some memory in the child process with size POPEN_NOSHELL_STACK_SIZE.

The other question "how much" we should allocate is more interesting :) Wikipedia says the following about the reason for having a stack: A call stack is used for several related purposes, but the main reason for having one is to keep track of the point to which each active subroutine should return control when it finishes executing.

Therefore, you are right - if we don't call subroutines inside the child process, and we actually don't, then we can set the stack size to a much lower amount. I am not aware of the absolute minimum, but my tests show that we should allocate at least 1024 bytes (1k).

I chose the default value for the stack size based on the current Linux default, which is 8 MB. You can see what's your default limit by executing ulimit -s. 8 MB is too much memory on an embedded device, so you are right to try to shrink this value. I went with the Linux default size, because I haven't tested the library on an embedded device, and because my use-cases involve systems where the processes use much more memory.

I'll be happy that you share any additional results using popen-noshell on an embedded device.

heshiming commented 9 years ago

Thank you very much for the explanation. So far 1024-byte stack size worked out okay on my Netgear WNDR3700v4 router with OpenWRT. Now that the memory problem is solved, I can tell you that performance improvement is pretty substantial.

I ran a small program that pings 8 hosts simultaneously. When this program is not running, the typical 15-minute load of the router is roughly 0.4. When the regular popen / pclose are used, it jumps to 1.5 or even 2.0 . But when your popen_noshell is used, it stays at about 0.45 or 0.5. It's not very accurate but the gain is definitely huge. Hats off to you!

EDIT: the idle load could be as low as 0.2 .

nicowilliams commented 7 years ago

Using vfork() would make this go away...