es-que / cpuset

Automatically exported from code.google.com/p/cpuset
GNU General Public License v2.0
0 stars 0 forks source link

'cset shield' limits multi-processor systems to a single NUMA node #9

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Use a multi-processor (not just multi-core) machine that has multiple NUMA 
nodes (e.g. any Intel starting with Nehalem).
2. cset shield --cpu=1-7 --kthread=on
3. Start process(es) that allocate slightly more than half your memory

What is the expected output? What do you see instead?
You would expect the processes would continue running successfully.  Instead 
the OOM killer starts killing things.

What version of the product are you using? On what operating system?
SLES 11

cset --version
cset: Cpuset (cset) 1.5.0

But, I just checked trunk and it looks like it is still a problem there.

Please provide any additional information below.

The problem is in commands/shield.py:

def make_shield(cpuspec, kthread):
    memspec = '0' # FIXME: for numa, we probably want a more intelligent scheme

It should probably be set instead to whatever is in 
/sys/devices/system/node/online, which on a dual cpu nehalem is "0-1" and on a 
single-cpu system, "0".

I think AMD went to NUMA/on-chip memory controllers long before Intel, so this 
has probably been an issue there even longer.

Original issue reported on code.google.com by charlesa...@gmail.com on 4 Oct 2011 at 9:38

GoogleCodeExporter commented 8 years ago
Thanks for the report.  Will be fixed soon.  For now, workaround is to use the 
set and proc commands to set up a shield manually (as shown in the tutorial).  
The set command has a --mem option that allows you to specify the memory node 
to use for that set.  It also has an --mem_exclusive option that stops remote 
memory allocations, if desired.

Original comment by tsariou...@gmail.com on 15 Feb 2012 at 5:54

GoogleCodeExporter commented 8 years ago

Original comment by tsariou...@gmail.com on 15 Feb 2012 at 5:58