Closed GoogleCodeExporter closed 9 years ago
Original comment by Graham.Dumpleton@gmail.com
on 10 Aug 2007 at 9:56
Original comment by Graham.Dumpleton@gmail.com
on 17 Sep 2007 at 5:44
An inactivity timeout is usually going to be counter-productive. If the
application
is really idle then the only resource it is consuming is memory. The operating
system
will swap the process to disk when memory is low. Implementing this timeout is
just
an inefficient way of duplicating the functionality of swap [1][2]. The next
time the
process gets started, it will have to read in the program, rebuild the heap,
re-cache
all its resources, etc. The VM can usually do this much more efficiently, and
at a
finer granularity, than the application can. In particular, in a shared hosting
environment, CPU is usually the most constrained resource. The VM can implement
swapping in/out with almost no CPU utilization; starting/stopping Python
processes is
much more taxing for the CPU. Applications and application resources are likely
to be
located on heavily loaded NFS and database servers, whereas swap is local; thus,
swapping should significantly reduce I/O contention.
The most effective things that mod_wsgi can do are:
* Use a "most frequently used" process and thread allocation mechanism to avoid
unnecessarily swapping in idle processes and allow applications to reuse
writable
memory segments
* Use sharable memory instead of writable memory whenever possible
* Base process termination/reloading on heap fragmentation. It is hard to
measure
heap fragmentation, but we can use the existing "maximum-requests" mechanism as
a
heuristic. This will reduce the number of writable, non-sharable pages touched
per
request by maximizing locality of reference. Ensuring that the garbage
collector is
run often should help as well.
* Allow each process to define its own python library path to minimize the
number of
libraries loaded per process [3].
[1] http://www.ukuug.org/events/spring2007/programme/varnish_tech.pdf
the "Fight the Kernel" slides in particular
[2] http://varnish.projects.linpro.no/wiki/ArchitectNotes.
[3]
http://wingolog.org/archives/2007/11/27/reducing-the-footprint-of-python-applica
tions
Original comment by brian@briansmith.org
on 9 Jan 2008 at 4:05
Your comments are valid, but the 'inactivity-timeout' isn't really in there as
a means of trying to compete with
swap. It is there for where people are trying to run as many applications as
possible in a VPS with small
allocated memory limit on all processes. The idea is that for infrequently used
applications the inactivity
timeout allows the memory of the application to be completely reclaimed so
something else can use it.
Relying on swap doesn't generally reduce the allocated memory limit which VPS
providers use to calculate
when someone is using too much memory.
Later on when this can be combined with transient processes then the whole
process can also be shutdown
when not required. Ie., just like FASTCGI systems do now. That way in commodity
web hosting type setup they
aren't stuck with a process for every user application even if it is not being
used.
Original comment by Graham.Dumpleton@gmail.com
on 9 Jan 2008 at 10:28
Graham, I understand what you are saying. But, infrequently used applications
are
exactly the type that benefit the most from swapping vs. killing and
restarting--they
are using ZERO physical memory; terminating them actually loads them into
memory so
they can run signal handlers and exit routines, immediately (but temporarily)
increasing the amount of physical memory and CPU time they use. Then, when they
are
re-loaded, they use up WAY more CPU and RAM memory in order to re-initialize
everything, which will end up causing a *lot* of swapping of other applications
if
memory is really tight. That in turn causes less-frequently-used applications
to have
much higher latency than they would if they were swapped out, while making them
seem
to take up more resources than they really do.
In my experience as a customer, web hosting providers kill *active* processes
that
are consuming too much CPU, and sometimes applications that are consuming WAY
too
much physical memory, not swapped-out ones that are consuming nothing. In fact,
they
often ban them for a long time period (hours or days) or indefinitely until
they are
modified to consume fewer resources. That is because the processes that got
killed
will usually just immediately restart and hog the system's resources again.
They do seem to have chron jobs that kill based purely on time intervals too,
but (1)
given enough swap, those robots are counter-productive for the reasons I
explained
above, and (2) those chron jobs will still work fine with mod_wsgi the way
mod_wsgi
is already implemented. The admin's existing scripts and chron jobs work for
*any*
process (FastCGI, mod_wsgi daemons, or GCC or some other command-line
application),
whereas mod_wsgi can really only manage itself.
It is possible that the admin is using a flawed metric (not per-process resident
memory) to measure processes, or that the admin is following a "swapping is bad
for
performance" philosophy that is only valid for applications that attempt to
reserve
huge swaths of RAM to implement their own virtual memory systems (e.g. Oracle
and
Squid). But, in those cases, any *real* memory sharing optimizations that
mod_wsgi
makes will be undone by whatever measures the administrator takes to avoid
swapping.
Original comment by brian@briansmith.org
on 16 Jan 2008 at 2:26
We still seem to be in part talking about two different things.
Can we leave this until we start talking about mod_wsgi architecture changes on
the mailing list. :-)
Also, the aspect of transient processes which is actually more important than
inactivity timeouts and being
able to shut them down is the ability for them to be created on demand the
first time they are required. This
is vitally important in commodity web hosting as they can't be dependent on
restarting Apache just to
reconfigure mod_wsgi to know about a new user and that he may want a daemon
process to run WSGI
applications.
Original comment by Graham.Dumpleton@gmail.com
on 16 Jan 2008 at 5:35
Now not likely to be done in version 3.0.
Original comment by Graham.Dumpleton@gmail.com
on 11 Apr 2008 at 5:25
How about making the syntax of directive "WSGIProcessGroup" to something like:
WSGIProcessGroup [ options of WSGI*Process ]
so for ex.
<Directory /home/*/public/>
WSGIProcessGroup user=%{ENV:USER} ...
...
</Directory>
will create a new process for each user the first time when it triggers the
directive. Once the process is created the next request will reuse the process.
Is this possible in the Apache model?
Original comment by gdam...@gmail.com
on 16 Mar 2009 at 6:09
Being able to dynamically bring up new daemon processes for users with
additional configuration is in part what
the intent of this whole exercise is. The intent was to have the
parameterisable bit defined in the options of
WSGIDaemonProcess (WSGITransientProcess), but the idea of having that driven by
something associated with
the WSGIProcessGroup directive is interesting and something I hadn't thought of.
Original comment by Graham.Dumpleton@gmail.com
on 16 Mar 2009 at 9:19
FWIW, where HTTP_USER was used in original examples, should have been
REMOTE_USER.
Original comment by Graham.Dumpleton@gmail.com
on 16 Feb 2010 at 5:45
When using AuthBasicProvider LDAP on apache 2.2.14 and wsgi 2.8.2, REMOTE_USER
did not work. Instead the following worked:
WSGIProcessGroup %{ENV:AUTHENTICATE_UID}
We have to define a WSGIDaemonProcess upfront for every user we have in our
database, but memory swapping and the "inactivity-timeout" field combined
hopefully shouldn't make this a problem anymore. Given that transient processes
might not be around for a while, is this the proper way to solve the problem?
Original comment by rosen.di...@gmail.com
on 4 Mar 2012 at 3:59
Nice dream. If ever decide to pursue this, will create new issue in github
issue tracker instead.
Original comment by Graham.Dumpleton@gmail.com
on 12 Nov 2014 at 10:51
Original issue reported on code.google.com by
Graham.Dumpleton@gmail.com
on 10 Aug 2007 at 6:53