Open swt2c opened 13 years ago
Hrm. Looks like what's happening is that normally acroread actually sticks around as long the plugin does, but for some reason the wrapped one dies sooner. I suspect something is causing it to crash and nppdf.so can't handle that. Not sure yet what.
Update: Yeah, acroread is segfaulting in XCreatePixmap
somewhere.
Hmm, so is this really the same as Issue #1, then? (Plugin crashes but wrapper does not)
I don't believe so. The viewer hasn't crashed. Without nspluginwrapper, there are two processes involved: the browser/nppdf.so process and a slave acroread process. The acroread process is internal to nppdf.so. Neither the browser nor nspluginwrapper is aware of its existence. That process is supposed to stay alive for the duration of the plugin's lifetime. With nspluginwrapper, the browser process (or your browser's plugin process for modern browsers) and the nppdf.so process are split yet again. Something about how the viewer process talks to nppdf.so is causing acroread to crash.
While this is confusing nppdf.so (the plugin itself), it doesn't appear to be enough to crash it. The fix is not to crash the wrapper --- as far as the wrapper is concerned the plugin hasn't crashed. The fix is to figure out what about nspluginwrapper is breaking acroread and fix that. My suspicion is that it's related to acroread using the ancient Xt-based stuff. The current implementation is pretty weird, and I've been meaning to rework it to match Mozilla's implementation which I know works (#2). But exactly what's going on that upsets acroread I'm not yet sure.
(Tangentially, Mozilla's emulation of Xt is also imperfect in faking parts of the event loop. But seeing as there no longer exists a browser that is actually natively Xt, I think emulating Mozilla is the right thing to do.)
Well, whatever is causing acroread to crash may be present in Firefox, too. When I run the same test (view PDF, hit back button) on a 32-bit machine, acroread goes defunct there as well. The difference there is that Firefox (or nppdf.so?) seems to figure out that the old acroread is useless and spawns off another one when you attempt to view another PDF.
Out of curiousity, how did you get acroread to dump a core? I set ulimit -c to unlimited before starting Firefox but that didn't seem to cause acroread to kick out a core.
Well, whatever is causing acroread to crash may be present in Firefox, too. When I run the same test (view PDF, hit back button) on a 32-bit machine, acroread goes defunct there as well. The difference there is that Firefox (or nppdf.so?) seems to figure out that the old acroread is useless and spawns off another one when you attempt to view another PDF.
Really? Huh, that's interesting. I think when I tried it on a 32-bit Firefox, acroread stayed alive. I'll have to look into that more closely. (Though I might not have a whole lot of time for it in the coming week; end of summer internship, flight back, etc. Somewhat hectic time.)
Out of curiousity, how did you get acroread to dump a core? I set ulimit -c to unlimited before starting Firefox but that didn't seem to cause acroread to kick out a core.
Oh I just attached gdb to it before it died.
Is this something that can be handled in nspluginwrapper, and is that planned, or is it something that Adobe needs to fix?
Is there anything the community can help with, short of patches, which seems to require too much understanding for someone outside the project to easily supply.
It works without nspluginwrapper, so this is definitely something to fix in nspluginwrapper. It's planned in as much as I intend to fix it. But I don't actually know yet why it's crashing, so I can't exactly give you any more that. (I have been able to reproduce it, as noted earlier.)
I'm not sure I can think of much to do (or I'd have requested it), short of patches or diagnosing the problem. Honestly, you overstate the complexity here. I didn't know anything about nspluginwrapper or NPAPI before taking on the project. (I had been involved in a browser before, but only on the network stack, not the plugin implementation.) What understanding I do have was the result of source-diving, documentation-hunting, and a desire watch Flash videos without being interrupted by crashes all the time. :-)
I have been working on trying to sort this one out for quite some time as it really annoys me. :-) From what I've gathered, this doesn't seem to be related to the Xt event loop handling. The reason I say this is: the current Firefox code does not support running Xt plugins out-of-process, but there is a patch out there to allow this to happen. When using this patch, the acroread zombie problem exists when running nppdf.so out-of-process in a 32-bit browser. This patch uses a lot of the same Xt-loop handling code as in the in-process implementation, so as best as I can determine, the problem has something to do with nppdf.so being run in a process outside the main browser process.
I figured out how to turn on debug in the nppdf.so plugin. I'm currently trying to find someone from Adobe who can help me interpret the output. It looks as is the nppdf.so plugin appears to not realize that the acroread process has exited and it tries to contact it and fails. When run in-process, it seems to be able to do this just fine.
Thanks for looking into this. I actually totally forgot about it and haven't looked at it for a while. :-/ (I have to admit, my interest in nspluginwrapper waned significantly after the 64-bit Flash went stable. At some point I need to make a new release with the build fixes that have accumulated, and then we'll see what happens with it. There are more things I want to do to it, but my motivation's gone to other projects of late.)
Good catch on the out-of-process thing! I didn't notice this that Firefox was running nppdf.so in-process. I was thinking it was actually the Xt embedding code, since acroread was managing to segfault in XCreatePixmap
. In the normal case, I think acroread should be long-lived and not crash, so my theory was that we set up the Xt container different from Firefox and confused nspluginwrapper. But the Firefox thing does suggest there may be more interesting things going on. (And doesn't bode well for a fix in nspluginwrapper since I can't just mimic Firefox.) Random thought: perhaps there needs to be an XSync somewhere. Although I would have expected an X11 error from that, not a segfault.
I agree the event loop is probably fine. nspluginwrapper's is way more complicated than it really should be, but it should be basically functional (and the Mozilla one is equally hacky).
(If you do contact someone at Adobe, maybe you could request that the plugin switch to using XEmbed instead of Xt. Browsers don't use Xt natively anymore anyway.)
Yes, I figured that was why you went kind of quiet. :-)
I know that when you looked at this, you were seeing acroread segfault, but in 99% of the times I've reproduced this problem, acroread exits cleanly. It looks like its behavior is that if it is not being used (displaying a PDF) for 30 seconds, it exits. The thing is, is when you ask nppdf.so to view a new PDF, it is supposed to launch a new copy of acroread. But when run under nspluginwrapper, it fails to realize that the old copy went away.
What further makes me believe this is not really nspluginwrapper doing something wrong is, if I simply kill acroread while running under nsplugwrapper, nppdf.so can't successfully re-start a new one. However, when running nppdf.so in-process and I kill acroread, nppdf can restart acroread just fine.
Yes, if I manage to contact someone from Adobe, I'll be sure to ask that they re-implement using XEmbed. Well, that and make a 64-bit version. :-) I'm not getting my hopes up, though.
Hrm, interesting. Maybe there's two problems then? Because I've never had to wait the 30 seconds for the child to die before problems. I'll have to find time to look at things again and verify that I'm actually seeing the segfault and reliably so.
It could be interesting to figure out exactly how nppdf finds out the acroread child dies. If they're setting the SIGCHLD
handler and using, say, XtAppAddSignal
/XtNoticeSignal
, I could easily believe something in glib or so is tripping them up, possibly combined with a broken Xt event loop bridge. (Though, as hacky as nspw's is, I think it's about as functional as Mozilla's? Mozilla only ever polls, while nspw tries badly to sniff about the struct and avoid polling sometimes... I'd like to kill that code because I'm pretty sure it doesn't save on polling.) UNIX signal handling is a complete nightmare, and there really isn't any better way (short of polling waitpid
) to be notified of a child's death.
How are you re-creating the problem then? All I ever do is view a PDF, hit the home button (which goes to a blank page for me) then wait.
Your idea about them using XtAppAddSignal / XtNoticeSignal is an interesting one.
Incidentally, at one point while trying to figure out this problem, I did make an attempt to basically rip out the nspw Xt code and replace it with the Mozilla code. Unfortunately, I did never get to work. The stuff that happens in create_window() mainly...
(I'm just viewing a PDF and pressing back. acroread crashes without fail on my machine. Latest stable acroread and firefox, Ubuntu maverick, and nspluginwrapper from master.)
Success! I think I know what's going on with it not noticing the death. The current code, for some reason, tries to turn off the Xt event loop compatibility code when there aren't any plugin instances active. Probably because the Xt event loop bridge involves a 40Hz timer, so the original author tried to minimize those. (Go read get_appcontext_input_count_offset
and resist the urge to gouge your eyes out.)
Tell me if commenting out the lines
if (!plugin->use_xembed)
xt_source_destroy();
in npw-viewer.c
fixes it for you. This allows it to respawn acroread, but doesn't fix the segfault for me; acroread still crashes all the time. If it works for you, I'll try to simplify this event loop bridge and make it match Mozilla's better. Judging from the -exitPipe 4
on the acroread command-line, I think it listens for child exit by sharing a pipe and waiting for EOF.
Yeah, I also have a branch to rip it all out and replace with gtk2xtbin.c
. But I haven't gotten mine working yet either.
Unfortunately, commenting out the xt_source_destroy doesn't change anything for me, nppdf still fails to spawn a new acroread copy. (I'm pretty sure I tried that at one point, earlier, too. :-))
Odd. Now that I try it with your instructions, it fixes it for me if I press back, but not if I press home. Then acroread still segfaults, but nppdf.so doesn't notice it. I'm so confused...
Me too :-)
Played with my gtk2xtbin branch some more. I finally got it to work, at least partly. It's a mess and needs some cleanup, but I do get a plugin visible.
https://github.com/davidben/nspluginwrapper/tree/gtk2xtbin
Unfortunately, keyboard focus doesn't work, and it does in the old code. So there's at least one thing to be fixed. Also, this didn't actually resolve the zombie problem. Good news is I have since been unable to get a segfaulting acroread? But that gotten harder to reproduce since the xt_source_destroy
thing, so I'm not actually sure when it actually was fixed. Also doesn't make any sense that the viewer's event loop not responding would cause acroread to segfault.
Also, I tried with a 32-bit Firefox, and noticed one weird thing: although it handled a dead acroread fine, it never reaped the old zombie! Which is extremely weird. I think this confirms it uses -exitPipe
to notice the child's death (it's got to be something like that... if it uses waitpid
, the zombie would be reaped). Another thing of note: when I forcibly killed acroread, it failed to launch a new one. I bet it's not expecting acroread to crash, and instead waits for it to send some message just before exiting. Possibly it'd be useful to strace the guy. Also, judging from /proc, the other end of the exit pipe in the browser end does not get closed until a new plugin is launched, which is odd. This suggests that perhaps it doesn't notice the old one expired until you need a new one. Can you confirm these on your end?
Unfortunately, this isn't enough to explain the failure because gdb says acroread exits gracefully after 30 seconds in nspluginwrapper, and yet it still fails to launch a new one.
I have been testing primarily with 32-bit Firefox (mainly so I can see how nppdf works when running in-process), so that probably explains why my results are slightly different. I'll re-confirm my results tonight.
Yes, I can basically confirm your results above. Although one difference, maybe? When I 'kill -9 acroread' while running under a 32-bit Firefox, nppdf is able to respawn a new acroread successfully.
The -exitPipe thing may indeed be the way the two processes communicate. I did some stracing and when acroread exits, it seems to send a: 25420 write(3, "400 : Acroread Quitting\n", 24) = 24 to the pipe. However, as you mentioned above, nppdf.so doesn't seem to read it until you try to open a new PDF: 25403 read(8, "400 : Acroread Quitting\n", 1024) = 24 The odd thing is that the message does appear to be getting read in both the "good" case under 32-bit Firefox and under nspluginwrapper, so I'm not sure if this tells us anything, yet.
Do you think that nppdf is calling XtAppAddInput on the --exitPipe file descriptor? Could npw be causing it to lose track of the file descriptor somehow?
There appears to be an issue with the Adobe Reader plugin. It seems that the plugin will exit when it is no longer in use. After this point, the browser instance will not be able to view PDFs anymore unless you go and kill off the nspluginwrapper instance.
A more concrete example (this is using Firefox 5 with Ubuntu Natty): 1) View a PDF in the browser. 2) Hit the back button. 3) Wait 30 seconds...acroread becomes a zombie. 4) Hit the forward button...you won't be able to view the PDF again.
At this point, if you kill -9 the nspluginwrapper instance and reload, you can view PDFs again.
Should nspluginwrapper perhaps be exiting itself when acroread does?