Closed bhuntsman closed 3 years ago
How to debug:
Thanks.
Ok, here's what we have running:
$ ps -ef | grep sam
ben 9502924 10354924 0 14:34:11 pts/0 0:00 samterm samterm (devdraw)
ben 10354924 15663122 120 14:34:11 pts/0 0:03 samterm
ben 14549002 7667848 0 14:34:11 pts/0 0:00 sam
ben 15663122 14549002 0 14:34:11 pts/0 0:00 samterm
$
I assume you want the (devdraw) process. On AIX, we have only have dbx, not gdb. But here we go:
$ dbx -a 9502924
Waiting to attach to process 9502924 ...
Successfully attached to devdraw.
warning: Directory containing devdraw could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in read at 0xd0122fcc ($t2)
0xd0122fcc (read+0x28c) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 wait running 93323343 k no sys
>$t2 run running 72941713 k no sys read
(dbx) where
read(??, ??, ??) at 0xd0122fcc
serveproc(v = 0x20039598), line 146 in "srv.c"
procmain(p = 0x2005d9e8), line 384 in "thread.c"
startprocfn(v = 0x2003b968), line 99 in "pthread.c"
(dbx) thread 1
thread state-k wchan state-u k-tid mode held scope function
$t1 wait running 93323343 k no sys
(dbx) where
read(??, ??, ??) at 0xd0122fcc
serveproc(v = 0x20039598), line 146 in "srv.c"
procmain(p = 0x2005d9e8), line 384 in "thread.c"
startprocfn(v = 0x2003b968), line 99 in "pthread.c"
(dbx)
Just for some more info, and curiosity, lets see what the other two are up to... PID 15663122, per above:
$ dbx -a 15663122
Waiting to attach to process 15663122 ...
Successfully attached to samterm.
warning: Directory containing samterm could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in read.read at 0xd0122fcc ($t1)
0xd0122fcc (read+0x28c) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run running 83886173 k no sys read
(dbx) where
read.read(??, ??, ??) at 0xd0122fcc
_threadsetupdaemonize(), line 151 in "daemonize.c"
p9main(argc = 1, argv = 0x2ff22298), line 840 in "thread.c"
main.main(argc = 1, argv = 0x2ff22298), line 10 in "main.c"
(dbx)
And PID 10354924:
$ dbx -a 10354924
Waiting to attach to process 10354924 ...
Successfully attached to samterm.
warning: Directory containing samterm could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in _event_sleep at 0xd0544924 ($t2)
0xd0544924 (_event_sleep+0x584) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 run running 78643213 u no sys _string
>$t2 run blocked 54657225 k no sys _event_sleep
$t3 wait 0xf1000e0000b014c8 running 79626241 k no sys
$t4 wait 0xf1000e0000b128c8 running 90243225 k no sys
$t5 wait 0xf1000a05d0356e50 running 71041093 k no sys
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029a40), line 60 in "pthread.c"
procscheduler(p = 0x20029948), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20045980), line 306 in "channel.c"
_chanop(c = 0x20029868, op = 1, p = 0x20045a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029868, v = 0x20045a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029438), line 63 in "mouse.c"
procmain(p = 0x20029948), line 384 in "thread.c"
startprocfn(v = 0x20028e58), line 99 in "pthread.c"
(dbx) thread 1
thread state-k wchan state-u k-tid mode held scope function
$t1 run running 78643213 u no sys _string
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029a40), line 60 in "pthread.c"
procscheduler(p = 0x20029948), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20045980), line 306 in "channel.c"
_chanop(c = 0x20029868, op = 1, p = 0x20045a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029868, v = 0x20045a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029438), line 63 in "mouse.c"
procmain(p = 0x20029948), line 384 in "thread.c"
startprocfn(v = 0x20028e58), line 99 in "pthread.c"
(dbx) thread 3
thread state-k wchan state-u k-tid mode held scope function
$t3 wait 0xf1000e0000b014c8 running 79626241 k no sys
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029a40), line 60 in "pthread.c"
procscheduler(p = 0x20029948), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20045980), line 306 in "channel.c"
_chanop(c = 0x20029868, op = 1, p = 0x20045a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029868, v = 0x20045a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029438), line 63 in "mouse.c"
procmain(p = 0x20029948), line 384 in "thread.c"
startprocfn(v = 0x20028e58), line 99 in "pthread.c"
(dbx) thread 4
thread state-k wchan state-u k-tid mode held scope function
$t4 wait 0xf1000e0000b128c8 running 90243225 k no sys
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029a40), line 60 in "pthread.c"
procscheduler(p = 0x20029948), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20045980), line 306 in "channel.c"
_chanop(c = 0x20029868, op = 1, p = 0x20045a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029868, v = 0x20045a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029438), line 63 in "mouse.c"
procmain(p = 0x20029948), line 384 in "thread.c"
startprocfn(v = 0x20028e58), line 99 in "pthread.c"
(dbx) thread 5
thread state-k wchan state-u k-tid mode held scope function
$t5 wait 0xf1000a05d0356e50 running 71041093 k no sys
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029a40), line 60 in "pthread.c"
procscheduler(p = 0x20029948), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20045980), line 306 in "channel.c"
_chanop(c = 0x20029868, op = 1, p = 0x20045a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029868, v = 0x20045a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029438), line 63 in "mouse.c"
procmain(p = 0x20029948), line 384 in "thread.c"
startprocfn(v = 0x20028e58), line 99 in "pthread.c"
(dbx)
I hope that provides some helpful information! Thank you!
dbx is showing the same stack for all the threads, so it's unfortunately not too helpful. There's only one thread with that particular stack (mouse._ioproc), I'm sure.
One stab in the dark would be to try
export font=/lib/font/bit/lucm/unicode.9.font
before starting sam.
That may also have been an artifact of dbx not behaving the way I expected. I tried setting font per your suggestion, but that did not change the behavior. However, I also pulled in all the latest changes, and used the dbx "where all" command. Here we go:
$ ps -ef | grep sam
ben 17694836 20185276 120 21:33:48 pts/0 0:51 samterm
ben 19005588 17694836 0 21:33:48 pts/0 0:00 samterm samterm (devdraw)
ben 19071186 7667848 0 21:33:48 pts/0 0:00 sam
ben 19267606 18088064 0 21:34:46 pts/2 0:00 grep sam
ben 20185276 19071186 0 21:33:48 pts/0 0:00 samterm
devdraw (PID 19005588):
$ dbx -a 19005588
Waiting to attach to process 19005588 ...
Successfully attached to devdraw.
warning: Directory containing devdraw could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in read at 0xd0122fcc ($t2)
0xd0122fcc (read+0x28c) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 wait running 83689597 k no sys
>$t2 run running 93192207 k no sys read
(dbx) where all
Thread $t1
warning: Thread is in kernel mode, not all registers can be accessed.
.() at 0x5c012200
x11-screen.select(__fds = 6, __readlist = 0x2ff1bf30, __writelist = 0x2ff1df30, __exceptlist = 0x2ff1ff30, __timeout = (nil)), line 244 in "time.h"
xloop(), line 301 in "x11-screen.c"
gfx_main(), line 274 in "x11-screen.c"
threadmain(argc = 2, argv = 0x2ff22268), line 85 in "srv.c"
threadmainstart(v = (nil)), line 813 in "thread.c"
procmain(p = 0x20037b48), line 384 in "thread.c"
p9main(argc = 3, argv = 0x2ff22264), line 871 in "thread.c"
main(argc = 3, argv = 0x2ff22264), line 10 in "main.c"
Thread $t2
read(??, ??, ??) at 0xd0122fcc
serveproc(v = 0x20038748), line 146 in "srv.c"
procmain(p = 0x2005cb98), line 384 in "thread.c"
startprocfn(v = 0x2003ab18), line 99 in "pthread.c"
(dbx)
PID 20185276:
$ dbx -a 20185276
Waiting to attach to process 20185276 ...
Successfully attached to samterm.
warning: Directory containing samterm could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in read.read at 0xd0122fcc ($t1)
0xd0122fcc (read+0x28c) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run running 90374233 k no sys read
(dbx) where all
Thread $t1
read.read(??, ??, ??) at 0xd0122fcc
_threadsetupdaemonize(), line 151 in "daemonize.c"
p9main(argc = 1, argv = 0x2ff22298), line 840 in "thread.c"
main.main(argc = 1, argv = 0x2ff22298), line 10 in "main.c"
(dbx)
PID 17694836:
$ dbx -a 17694836
Waiting to attach to process 17694836 ...
Successfully attached to samterm.
warning: Directory containing samterm could not be determined.
Apply 'use' command to initialize source path.
Type 'help' for help.
reading symbolic information ...
stopped in _event_sleep at 0xd0544924 ($t2)
0xd0544924 (_event_sleep+0x584) 80410014 lwz r2,0x14(r1)
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 run running 78184693 u no sys _string
>$t2 run blocked 78381249 k no sys _event_sleep
$t3 wait 0xf1000e0000ad68c8 running 91160609 k no sys
$t4 wait 0xf1000e00001b6cc8 running 81461303 k no sys
$t5 wait 0xf1000a05d0356580 running 78643293 k no sys
(dbx) where all
Thread $t1
_string(dst = 0x200294c8, pt = (...), src = 0x20028ef8, sp = (...), f = 0x20029ec8, s = "", r = 0x100520d0, len = 16777213, clipr = (...), bg = (nil), bgp = (...), op = SoverD), line 84 in "string.c"
string.string(dst = 0x200294c8, pt = (...), src = 0x20028ef8, sp = (...), f = 0x20029ec8, s = "new"), line 13 in "string.c"
paintitem(m = 0x200294c8, menu = 0x20003158, textr = (...), off = 0, i = 0, highlight = 0, save = (nil), restore = (nil)), line 99 in "menuhit.c"
menupaint(m = 0x200294c8, menu = 0x20003158, textr = (...), off = 0, nitemdrawn = 6), line 134 in "menuhit.c"
menuhit(but = 3, mc = 0x20029568, menu = 0x20003158, scr = (nil)), line 239 in "menuhit.c"
menu3hit(), line 131 in "menu.c"
threadmain(argc = 1, argv = 0x2ff22298), line 144 in "main.c"
threadmainstart(v = (nil)), line 813 in "thread.c"
procmain(p = 0x200181d8), line 384 in "thread.c"
p9main(argc = 1, argv = 0x2ff22298), line 871 in "thread.c"
main.main(argc = 1, argv = 0x2ff22298), line 10 in "main.c"
Thread $t2
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0544924
_event_wait(??, ??) at 0xd054544c
_cond_wait_local(??, ??, ??) at 0xd0553e38
_cond_wait(??, ??, ??) at 0xd0554444
pthread_cond_wait(??, ??) at 0xd05550dc
_procsleep(r = 0x20029770), line 60 in "pthread.c"
procscheduler(p = 0x20029678), line 441 in "thread.c"
_threadswitch(), line 259 in "thread.c"
channel.chanalt(a = 0x20047980), line 306 in "channel.c"
_chanop(c = 0x20029598, op = 1, p = 0x20047a28, canblock = 1), line 326 in "channel.c"
chansend(c = 0x20029598, v = 0x20047a28), line 334 in "channel.c"
mouse._ioproc(arg = 0x20029568), line 63 in "mouse.c"
procmain(p = 0x20029678), line 384 in "thread.c"
startprocfn(v = 0x20029458), line 99 in "pthread.c"
Thread $t3
warning: Thread is in kernel mode, not all registers can be accessed.
.() at 0x5c011c90
readn(f = 5, av = 0x20066514, n = 4), line 12 in "readn.c"
_drawrecv(mux = 0x20028ce8, canblock = 1, vp = 0x20066570), line 177 in "drawclient.c"
drawrecv(mux = 0x20028ce8), line 198 in "drawclient.c"
_muxrecv(mux = 0x20028ce8, canblock = 1, vp = 0x20066618), line 98 in "io.c"
muxrpc(mux = 0x20028ce8, tx = 0x20029458), line 141 in "mux.c"
displayrpc(d = 0x20018bf8, tx = 0x200666d0, rx = 0x20066870, freep = (nil)), line 271 in "drawclient.c"
_displayrdkbd(d = 0x20018bf8, r = 0x20066a50), line 337 in "drawclient.c"
keyboard._ioproc(arg = 0x20029b88), line 32 in "keyboard.c"
procmain(p = 0x2004a278), line 384 in "thread.c"
startprocfn(v = 0x20029458), line 99 in "pthread.c"
Thread $t4
warning: Thread is in kernel mode, not all registers can be accessed.
.() at 0x5c011c90
hostproc(arg = 0x20029c98), line 295 in "plan9.c"
procmain(p = 0x20069128), line 384 in "thread.c"
startprocfn(v = 0x20029d38), line 99 in "pthread.c"
Thread $t5
warning: Thread is in kernel mode, not all registers can be accessed.
.() at 0x5c011c90
extproc(argv = 0x20008eb0), line 131 in "plan9.c"
procmain(p = 0x20087fd8), line 384 in "thread.c"
startprocfn(v = 0x20029dd8), line 99 in "pthread.c"
(dbx)
Hopefully that's a little better...
Sorry for the delayed response. I don't understand the thread t1 that is stuck "running" in _string. Maybe it is in an infinite loop of some kind. It is in a loop calling cachechars but it is supposed to make progress on each iteration. Maybe add some debugging prints inside _string or at the top of cachechars to see what the calls look like?
Sorry for leaving this hanging for so long. Something is definitely looping and not making progress. I put some print statements in src/libdraw/font.c in cachechars:
int
cachechars(Font *f, char **ss, Rune **rr, ushort *cp, int max, int *wp, char **subfontname)
{
int i, th, sh, h, ld, w, rw, wid, nc;
char *sp;
Rune r, *rp, vr;
ulong a;
Cacheinfo *c, *tc, *ec;
/* Begin AIX debugging */
print("Font: %s\n", f->name);
print("ss: %s\n", ss);
print("rr: %S\n", rr);
print("cp: %d\n", cp);
print("max: %d\n", max);
print("wp: %d\n", wp);
print("subfontname: %s\n", subfontname);
/* End AIX debugging */
...
I run sam again and get the following output repeated forever:
Font: *default*
ss: Ao
rr: <nil>
cp: 804395824
max: 100
wp: 804395816
subfontname:
Adding a print after the call to cachechars indicates that it always returns 0 each time as well. Perhaps not a thread issue after all?
Thank you!
This is interesting. I think the problem is with cachechars returning 0. I inserted the following two print statements to the end of the cachechars function in src/libdraw/font.c:
Found:
wid += c->width;
c->age = f->age;
cp[i] = h;
i++;
print("\n--- i: %d\n", i);
}
if(ss)
*ss = sp;
else
*rr = rp;
*wp = wid;
print("cachecars returning: %d\n", i);
return i;
When I run sam, cachechars returns 1 a few times, but as soon as I right-click or type a character, I just get endless "cachechars returning: 0".
I did some more digging, comparing a step-by-step run on Linux against a step-by-step on AIX, and tracked the issue down to src/libdraw/string.c:80:
while((*s || *r) && len){
For example, if I start sam, attach a debugger to the samterm process, and set a breakpoint on cachechars, and then type "a" in the sam window, I can see that we end up calling cachechars twice. The second call is for our "a". This call works correctly, and in cachechars sp gets incremented by the value of w, which is 1 in this case. Now, *sp = "".
On Linux, we get to the while statement above, and the statement evaluates to false and we move on to line 146:
return pt;
On AIX, the statement evaluates to true and we enter the loop. Now we end up passing garbage arguments to cachechars and it always returns 0 and we end up looping forever.
The following change to string.c resolves the issue for AIX, but causes a segfault on other platforms:
while((*s || *rptr) && len){
Not sure this is the best way to resolve this, but something like this seems to be a solution that allows AIX to work without breaking anything anywhere else:
diff --git a/src/libdraw/string.c b/src/libdraw/string.c
index 5ba5016a..4f08b1b8 100644
--- a/src/libdraw/string.c
+++ b/src/libdraw/string.c
@@ -81,7 +81,11 @@ _string(Image *dst, Point pt, Image *src, Point sp, Font *f, char *s, Rune *r, i
}else
rptr = &r;
sf = nil;
+#if defined(__AIX__)
+ while((*s || *rptr) && len){
+#else
while((*s || *r) && len){
+#endif
max = Max;
if(len < max)
max = len;
I will submit a pull request with this change.
On AIX, devdraw seems to hang any time it would draw a character on the screen. This occurs in any program that uses devdraw, including src/libdraw/test.c. In sam, the hang can be triggered by typing a character in the command window, or displaying the menu. Upon right-clicking, the following is displayed:
I enabled some of the debugging statements in devdraw and libdraw, and I can see that the window still responds to mouse position events, window resizes, and even receives character input, but only a limited quantity after. Approximately 3-4 additional keyboard events get processed before devdraw totally locks up and stops responding to any events.
This has always been an issue on AIX, but I haven’t been very interested in the GUI programs until recently.
I could use some intuition as to if this sounds like a problem with the font code, X11 draw code, or perhaps even libthread.