image allocation breaks cpu server

GoogleCodeExporter commented 9 years ago

enabling vesa graphics on my cpu server and using it for more than just a few 
modest windows breaks the system. the first symptom is the message "no free 
mount rpc buffer". 

sometimes other messages appear, "no memory for allocb" or "iallocb: no memory 
66592/61841408". both of these are much less common than the rpc message.

quite often the rpc message only comes up a few times and you can get rid of it 
by closing a window. sometimes the rpc message spams and there's nothing you 
can do. almost always when any of these messages appear something else is 
limited, whether it's opening a window, starting a new shell, or running any 
command at all. sometimes even ctrl-alt-del does nothing. i have only once seen 
the machine carry on after the "iallocb" message.

the machine in question is my r400, with 2gb ram, 2gb swap, and is a 
cpu/fs/auth server. fs is cwfs. recently i added *imagemaxmb=512 to plan9.ini. 
the display is 1440x900x32. typical usage is a few small windows like clock and 
stats, sometimes page open showing man -P in a window fitted to the page size, 
a big sub-rio (1440x768 ish), acme within the sub-rio, sometimes page or mothra 
also within the sub-rio. also equis is typically running (1088x768+borders), 
and/or vncv (1440x768ish again). 

running both equis and vncv is likely to trigger it very quickly, but i can 
trigger it without either; just using page and acme. mothra is perhaps the most 
likely to set it off of the all-native plan 9 programs. in fact, compared to 
plan 9 programs, using opera in equis seems to actually be a more practical way 
to view images and text than using page and acme. opera will stall for many 
seconds at a time and sometimes even stop responding altogether, but it will 
display far more than page before it or the system crashes.

*imagemaxmb=512 has made the crashes *more* likely if it's made any difference 
at all. i don't recall seeing "no free mount rpc buffer" at all before i added 
this. plan9.ini(8) states that without this option, image memory defaults to 
all available ram and it's got 2gb ram and the same swap, so wtf is going on?

besides all this, it's very very common for page to report "readimage: 
allocimage: image memory allocation failed" for moderate images (1000x1000 or 
less). on my hardware this is absolutely batshit insane! this is rather random 
so i don't think it's an obsolete size check.

Original issue reported on code.google.com by tereniao...@gmail.com on 30 Jul 2013 at 5:59

GoogleCodeExporter commented 9 years ago

not surprising at all.

a value of 512MB for *imagemaxmb= is insane because the kernel only has a 256MB
virtual memory window. the real problem is that the cpu kernel uses a heuristic 
that
limits kernel memory to a fixed amount (64MB + size of page tables). note that
image memory is within that space! this will be enougth for a few rio windows, 
but is not practical for graphics heavy stuff. as a work around, i disabled 
that heuristic now (see commit r49af4f09cf64) when *imagemaxmb= is specified.

Original comment by cinap_le...@felloff.net on 2 Aug 2013 at 2:04

Changed state: NeedsTesting

GoogleCodeExporter commented 9 years ago

i updated kernel & all to get the change, and set a more reasonable 
*imagemaxmb=128, but this has made no difference. image memory is still capped 
at 68.4MB exactly as before the changes.

cpu% cat /dev/swap
2064351232 memory
4096 pagesize
30196 kernel
125929/473796 user
0/160000 swap
19010080/71750016 kernel malloc
49697920/71750016 kernel draw
cpu% hoc
71750016/1024^2
68.4261474609

Original comment by tereniao...@gmail.com on 10 Aug 2013 at 2:17

GoogleCodeExporter commented 9 years ago

i'v tested this change and it works. 

    if(cpuserver) {
        if(userpcnt < 10)
            userpcnt = 70;
        kpages = conf.npage - (conf.npage*userpcnt)/100;

        /*
         * Hack for the big boys. Only good while physmem < 4GB.
         * Give the kernel fixed max + enough to allocate the
         * page pool.
         * This is an overestimate as conf.upages < conf.npages.
         * The patch of nimage is a band-aid, scanning the whole
         * page list in imagereclaim just takes too long.
         */
>>      if(getconf("*imagemaxmb") == 0)
        if(kpages > (64*MB + conf.npage*sizeof(Page))/BY2PG){
            kpages = (64*MB + conf.npage*sizeof(Page))/BY2PG;
            conf.nimage = 2000;
            kpages += (conf.nproc*KSTACK)/BY2PG;
        }
    } else {

the >> marked line is the one that allows the same kernel/user split allocation
like with the terminal kernel. please make really sure this is the effective
code you'r using. add a debug print in that routine to make sure thats the
actual code you'r using.

Original comment by cinap_le...@felloff.net on 25 Aug 2013 at 7:53

GoogleCodeExporter commented 9 years ago

Aye, my mistake. My new-kernel script wasn't building the kernel before 
install. I fixed that, and tested by setting *imagemaxmb=192 and opening a lot 
of gifs in separate windows. Used draw memory was then 191MB, which is close 
enough for me. Thanks!

Original comment by tereniao...@gmail.com on 26 Aug 2013 at 1:05

Changed state: Fixed

cptaffe / plan9front

image allocation breaks cpu server #181