Closed dazarewicz closed 7 years ago
Printing here with the omni driver did not crash the browser but created a 100% CPU usage when starting to print the document. Killing the browser did not help to fix the CPU load, but killing pmview did the job.
@wztest what does pmview have to do with firefox?
I tested version 24.8.1 on 7 systems. Of the 7 systems tested, one seems to print OK, the other 6 do not. When attempting to print, FF quietly exits with no trap file and no popuplog.
I was able to capture a process dump for some of the failures. The process dump appears to show that FF has decided to exit (for whatever reason) and encounters an error while exiting.
The next step would be to figure out why FF has chosen to exit.
Additional information: All of my systems print using the LASERJET.DRV printer driver since that is the kind of printers I have. I also tested FF using the OMNI and PSCRIPT drivers. The OMNI driver has the same result in that FF quietly exits. With limited testing on one system, FF seems to be able to print using the PSCRIPT driver.
Thanks for the info, David. I need to perform my own tests to see what is going on and how much of this I can reproduce. I will let you know how it goes on.
24.8.1 B2, printing a single page from the above URL to my PSCRIPT (30.800) works successfully.
24.8.1 B2, printing a single page from the above URL to my LASERJET (30.792) works successfully. FYI, same printer as above, but using the LASERJET emulation...
Latest Firefox Beta2 release with the OMNI printer driver quits also when trying to print the page, and also nothing in popuplog.
With the new 24.8.1 beta 3 I now get a trap file. The trap is the same as I have seen in the debugger with the previous version. I don't see any way to attach a file to this ticket so here is a link: http://88watts.net/x/008C_01.TRP
FWIW - I had this problem (crash when trying to print) with my OMNI printer. Printing worked with PSCRIPT via CUPS to the same printer. I was able to repair by deleting the OMNI printer object AND the OMNI print driver object (did not delete driver files), then reinstalling the driver and created new printer object.
@dazarewicz did you eventually try what DavidMcKenna suggested? Not that we search a ghost.
You mean deleting and recreating the printer object? That didn't make any sense since I had just created the printer object. However, I did this today...
Create printer object using LASERJET. Try printing -> crash. Delete printer object. Reboot Create printer object using LASERJET Try printing -> crash.
Why did this just close itself?
David thx for testing. Btw did Firefox 17 print? Or which was the last Firefox printing ok? Eventually we find something with that information faster.
I still had version 10.0.12 installed. It trapped when printing: http://88watts.net/x/004C_01.TRP I'll find 17 and try it.
I tried FF 17.0.5 and it won't even run long enough to load a web page.
I tried installing a test printer HP LaserJet 6L that outputs to a file (I don't have any real laser printer around) using the LASERJET driver version 30.827. All works smoothly here. However, if I set up an Omni printer (e.g. Epson ActionLaser 1600 in PCL5, same driver version), I get a similar crash:
https://gist.github.com/dmik/dd62f5ca41e6fc901a14
David, could you try this driver version too to see if it makes any difference for you?
Another question: where do you get things like pmgpi.sym
?
On 02/24/15 02:13 PM, Dmitriy Kuminov wrote:
Another question: where do you get things like |pmgpi.sym|?
On the eCS2.2b2 ISO it is in \os2image\sym_1\os2\dll along with most other sym files. Might be worth testing FF4 as well. The Cairo backend didn't change between 4 and 10 and I believe even 24. Dave
Right, I knew it's somewhere in the ISO. Corrected my log. It doesn't shed more light though.
LASERJET.DRV version 30.827 is the version I am already using.
Test by simply setting the printer queue to "hold" instead of "release". This allows the app to print normally, but the queue never actually sends anything to the printer. No need to print to a file.
I will have to dig around to find version 4. That was a long time ago.
On 02/24/15 08:53 PM, dazarewicz wrote:
I will have to dig around to find version 4. That was a long time ago.
They should be on ftp.netlabs.org/incoming/mozilla or possibly ftp.netlabs.org/mozilla.
I couldn't find any version 4. version 3.0.7 only prints to a pdf so N/A version 3.5.8 only prints to a pdf so N/A version 3.6.4 only prints to a pdf so N/A version 8.0.1 crashes when printing version 9.0.1 crashes when printing version 10.0.2 crashes when printing All test were run with a fresh install of FF into an empty directory and after deleting the profile directory and with no plugins. FF was run on a system with the current libc/gcc DLLs, not the older DLLs that were available at the time.
I went through my old backup disks and found some old versions. version 1.5.0.1 Prints fine. No problems. version 2.0.0.9 Prints fine. No problems. version 2.0.0.12 Prints fine. No problems. version 3.0 only prints to a pdf so N/A
On 02/24/15 10:16 PM, dazarewicz wrote:
I couldn't find any version 4.
Now that I think of it, I believe Walter uploaded it to Mozilla's ftp site. FF4 was the first version with the Cairo print surface so no point testing older, prior to FF3 we didn't use Cairo and used the normal OS/2 printing mechanism. Prior to FF 10.0.12 the only dependency was libc. IIRC 10.0.2 and earlier 10.0x were built with the wrong widget code so the problem is likely in the gfx code. I suspect there are still problems in the cairo code judging by some of the trp reports I've seen but I'm not knowledgeable enough to say for sure. I did some testing but couldn't crash any version. Start firefox with -profilemanager to create a test profile rather then deleting your profile
After some printf work I see that it always crashes inside a call to GpiSavePS
in _cairo_os2_printing_surface_show_glyphs()
in cairo-os2-printing-surface.c
. Each of these GpiSavePS
calls is paired by the subsequent GpiReleasePS
call so it's surely not an overflow due to deep nesting. Playing around to see what could influence that. May be some specific font makes it crash as on simpler pages (like the http://ya.ru) it doesn't crash (I'm using the David's example to screw it).
David, can you check if printing http://ya.ru works for you?
That page prints OK, however there really is nothing on it. Just a couple of images.
This confirms that the problem is somehow related to the complexity of the printed page to the extent of the amount of text, most likely font-related. Somehow, drawing a lot of font glyphs for printing through GPI screws up PM.
Some more info. It crashes after printing 40-60 glyphs, each time on a different one. I can't find any consistent pattern so far. Patches by Rich to cairo seem to make it use GPI primitives to draw font glyphs on a HPS. My guess is that some GPI/PM limit is hit when doing this and it screws up PM. This also may be related to some matrix transformation.
I don't expect that it will be easy to find where exactly cairo enters the unsafe zone and, moreover, it's not guaranteed that it can be worked around w/o losing functionality... GPI is a VERY old part of OS/2 and perhaps one of the most bogus ones. All modern apps (e.g. Qt) do all 2d drawing on their own and only use raster features of GPI. That may explain why we don't see problems like that with other apps.
Also, printing in modern apps is done through CUPS (which uses PS which is sent directly to the printer), so no GPI primitives involved as well. And CUPS printing works great from Firefox too.
Reviewing old trp reports I found this,
I can print from FF v. 4.0.1 but not from v. 6.0 and 4.0.2pre. I get exception reports from v. 6.0 and both versions of 4.0.2pre, and they all points to PMMERGE.DLL. The reports are attached.
With a similar trp report to David's though it includes ft2lib
Exception Report - created 2011/07/25 22:30:07
Firefox v4.0.2pre - build 20110702073850
OS2/eCS Version: 2.45
Physical Memory: 3071 mb Virt Addr Limit: 1536 mb Exceptq Version: 7.10 (Mar 1 2011)
Exception C0000005 - Access Violation
Process: D:\DOWNLOAD\FIREFOX\FIREFOX-20110702\FIREFOX-20110702\FIREFOX.EXE PID: 5B (91) TID: 01 (1) Priority: 200
Filename: C:\OS2\DLL\PMMERGE.DLL Address: 005B:1E8A2E56 (0004:00072E56) Cause: Attempted to read from 05620562 (not a valid address)
Failing Instruction
1E8A2E4D MOV EDI, [EBP-0x10](8b7d f0) 1E8A2E50 OR EDI, EDI (0bff) 1E8A2E52 JZ 0x1e8a2e61 (74 0d) 1E8A2E54 MOV EAX, EDI (8bc7) 1E8A2E56 >MOV EDI, EDI 1E8A2E58 CALL 0x1e86bd30 (e8 d38efcff) 1E8A2E5D OR EDI, EDI (0bff) 1E8A2E5F JNZ 0x1e8a2e54 (75 f3)
Registers
EAX : 05620562 EBX : FFFFFFFF ECX : 0001005B EDX : 13E8B240 ESI : 0200002F EDI : 05620562 ESP : 0012F084 EBP : 0012F0A0 EIP : 1E8A2E56 EFLG : 00210202 CS : 005B CSLIM: FFFFFFFF SS : 0053 SSLIM: FFFFFFFF DS : 0053 ES : 0053 FS : 150B GS : 0000
EAX : not a valid address EBX : not a valid address ECX : read/exec memory at 0001:0000005B in FIREFOX EDX : read/write memory at 000C:0000B240 in PMMERGE ESI : uncommitted memory allocated by LIBC063 EDI : not a valid address
Stack Info for Thread 01
Size Base ESP Max Top 00100000 00130000 -> 0012F084 -> 0011B000 -> 00030000
Call Stack
EBP Address Module Obj:Offset Nearest Public Symbol
Trap -> 1E8A2E56 PMMERGE 0004:00072E56
0012F0A0 1E8A2DEE PMMERGE 0004:00072DEE
0012F0C4 1E8A58E3 PMMERGE 0004:000758E3
140218FC 00000001 Invalid address: 00000001
Labels on the Stack
ESP Address Module Obj:Offset Nearest Public Symbol
0012F0A4 1E8A2DEE PMMERGE 0004:00072DEE 0012F0C8 1E8A58E3 PMMERGE 0004:000758E3 0012F0D4 1E86C3FA PMMERGE 0004:0003C3FA 0012F110 1FE844C4 PMGPI 0003:000244C4 0012F128 1FE849C0 PMGPI 0003:000249C0 0012F130 1FE40208 PMGPI 0001:00000208 0012F134 00010003 FIREFOX 0001:00000003 between text + 3 and _main - 141 (in {standard input} and nsBrowserApp.o) 0012F138 0001000A FIREFOX 0001:0000000A between __text + A and _main - 13A (in {standard input} and nsBrowserApp.o) 0012F144 1FE61396 PMGPI 0003:00001396 0012F154 1FE47D54 PMGPI 0001:00007D54 0012F160 1FE47D8B PMGPI 0001:00007D8B 0012F164 00010003 FIREFOX 0001:00000003 between text + 3 and _main - 141 (in {standard input} and nsBrowserApp.o) 0012F168 0001000A FIREFOX 0001:0000000A between __text + A and _main - 13A (in {standard input} and nsBrowserApp.o) 0012F174 1FE61396 PMGPI 0003:00001396 0012F188 1FE6E8E0 PMGPI 0003:0000E8E0 0012F18C 1FE6E81C PMGPI 0003:0000E81C 0012F190 1FE6E940 PMGPI 0003:0000E940 0012F194 1FE6E904 PMGPI 0003:0000E904 0012F1A4 1FE8459E PMGPI 0003:0002459E 0012F1B8 1FE849C0 PMGPI 0003:000249C0 0012F1C0 1FE90053 PMGPI 0003:00030053 0012F1CC 1FFC176E DOSCALL1 0002:0000176E 0012F1FC 1FFC12AA DOSCALL1 0002:000012AA 0012F210 1F031181 FT2LIB 0001:00011181 ...
firefox 4.0.1 should be retested
Can't load github to edit my last message so to follow up, There's a few similar FF6 printer crashes, both omni and laserjet drivers. One user suggested setting Printer -specific_format under queue options (HP Laserjet 4L) as a workaround. Firefox 4.0.1 was distributed with eCS 2.1, and is on the iso. If using 4.0.1 places.sqlite will be marked corrupt, just move places.sqlite3.corrupt back to places.sqlite3 after testing to fix. Probably unrelated but printing to PDF and PS also broke between ff4 and ff6 with the output becoming garbled like a buffer was overrun.
OK, recreated the HP Laserjet 6l and reproduced the crash with my build of Firefox 24.8.1 and printing http://os2news.warpstock.org/ changing the setting as described above didn't help. Testing with FF4.0.1 does not crash so this is a regression between FF4 and FF6. We never had a build of FF5 as right after FF4 they removed the disable IPC code path and Walter patched FF6. Should I try to find the regression by bisecting the build? FF4 built in under an hour but applying the patches...
Dave, you are doing an important thing here. Of course finding the exact change would help greatly.
I guess the first thing is to review the changes between the patchsets. If anyone is interested I've uploaded Rich's original v4 patches and Walters v6 patches to netlabs incoming/mozilla. May just be a simple rebasing error. ff40-patches.zip ff6patches.zip
Mozilla just closed https://bugzilla.mozilla.org/show_bug.cgi?id=682952 (FF 6 crashes for most of the pages I try to print). It has some trp reports and rehashes this issue. Rich seems to have come to the conclusion the problem was in the Presentation Manager. We were held back by no debug build back then. Still strange that FF4 works. My build of FF4 has the 100%CPU with no connection issue. This happened with libc064 (first release) and IIRC GCC 4.5.1 as well.
That it might be a PM problem, is what we also think. David, did you try printer specific format, like mentioned in the above mozilla ticket?
It's interesting that you say it might be a PM problem. If that were the case, I would expect to see problems in other applications but I do not. FF is the only app with problems. PMMail, PMView, Mesa, Describe, PMFax, Lucide, Acrobat, editors, and all other PM programs print just fine.
Setting Printer specific format has no effect. It still crashes with this set.
David, do you know which of the applications you listed use GPI primitives to rasterize vector fonts? Perhaps, none. I guess that even those that print out text in vector format use GpiCharString
and friends. I think Netscape/Mozilla used it as well back then (in pre 4.x era) and that worked.
I'm just being devil's advocate. From a user point of view, the user says, "I have all these dozens of apps that all print OK, and one app that doesn't. Something must be broken in that one app". The user doesn't care about which function the app uses, they just see it as the app is broken. This is especially true when it used to work, then a new version came out and it was broken.
From a developer point of view, if you truly believe that the primitive you are using is broken, why do you insist on using it when obviously there is another way that works? Alternatively, perhaps the primitive is not really broken, perhaps it is being used incorrectly (or not as intended). I don't really know. I'm just offering some alternate points of view. I haven't looked at the code and it has been years since I have worked on this type of code in a PM app and I don't remember which functions I used.
Being a devil's advocate, you are of course right :) But that's another story. We have a workaround for that scenario that works really well ("Use CUPS").
What about the dev POV, the whole purpose to switch font rendering from GPI to cairo was to significantly improve output quality, IIRC: it is known that GPI sucks in that area function wise (it doesn't have good support for TrueType and a dozen of other things). This switch required to use GpiMove/GpiLine directly to draw font glyphs. Somehow when doing this we push some GPI limits and it crashes. Yes, it is possible that we do something not as GPI expects but in my opinion it's still a failure of PM — a good API doesn't crash, regardless of the input. And, more over, there may be some workaround that makes it not crash but still give somewhat acceptable results but as I already said it looks non-trivial to find it ATM. I will continue looking though.
It would be interesting to count how many primitives are called before crashing... maybe we hit some magic number like 32K or 64K or ...
OK, I got FFv4 built and working (problem was bsd_select in nsprpub) with the ffv4 patches. Printing seemed to work fine. I then rebased the printing patches from FFv6 and rebuilt, printing still worked fine. So it seems it is a regression in the Mozilla code rather then in Rich's patches, which did evolve quite a bit. I'll see if I can get FFv5 built next though that is going to mean rebasing the IPC patches :(
@dryeo how do you come to the conclusion it's Mozilla? Do you want to say that you have a build of Firefox where printing works with the current set of cairo patches from Rich? Which build is that then? (exactly, in terms of mercurial commits).
@dryeo You were also testing the OMNI driver?
On 03/06/15 03:00 AM, Dmitriy Kuminov wrote:
@dryeo https://github.com/dryeo how do you come to the conclusion it's Mozilla? Do you want to say that you have a build of Firefox where printing works with the current set of cairo patches from Rich? Which build is that then? (exactly, in terms of mercurial commits).
What I did was rebuild FFv4.01 with the original ffv4 patches + rm_tcpip40hdrs, KOMH's fix_no_connection and disabled bsd-select (needed to actually connect and fix 100% CPU). Printing worked. I then replaced the cairo and printing patches with the ones from FFv6 (about +-8 revisions ahead according to Rich's naming), removed the PORTRAIT ? LANDSCAPE part as it wasn't implemented yet and printing still worked. I then built FFv6 Aurora (revision cf0a29826586) with the FFv6 patches and printing crashed in PMMERGE, see https://gist.github.com/dryeo/9d949391db1bac128355 for trp and log. Note, I created both the omni and laserjet printer drivers referred to in this issue and used a new profile, something was still weird as the FFv4 test only allowed me to pick the laserjet and the FFv6 only allowed the omni driver and according to the log actually used the null driver. I'm currently rebuilding FFv4 and will retest with only one driver at a time installed. I also want to do some intermediate builds but have had a shortage of time. The only real change I made to adapt the FFv6 patches to FFv4 was this quick hack, so perhaps the addition of Landscape vs portrait had something to do with the crash.
diff --git a/widget/src/os2/nsDeviceContextSpecOS2.cpp b/widget/src/os2/nsDeviceContextSpecOS2.cpp --- a/widget/src/os2/nsDeviceContextSpecOS2.cpp +++ b/widget/src/os2/nsDeviceContextSpecOS2.cpp @@ -553,19 +553,20 @@ mPrintSettings->GetPaperWidth(&width); mPrintSettings->GetPaperHeight(&height); width = POINTS_PER_INCH_FLOAT; height = POINTS_PER_INCH_FLOAT;
PRInt32 orientation;
mPrintSettings->GetOrientation(&orientation);
newSurface = new(std::nothrow) gfxPSSurface(stream, gfxSize(width, height));
if (newSurface) static_cast<gfxPSSurface*>(newSurface.get())->SetDPI(double(mXDpi), double(mYDpi)); }
//** Native else { char\ filePath = nsnull;
OK retested with both the omni (epson) and laserjet drivers referred to in this issue. Couldn't remember or find the revision I used last for FFv4 so used GECKO_2_1_BASE which gave me FFv4B13pre. Same results with FFv6 crashing (I have the trps and logs). These are built with the same environment as FFv10.0.12 (GCC 4.4.6) except libc066 now.
This is getting really complicated. A 'new' build of Firefox6 works with old patches applied but fails when using new Firefox 6+ patches. The good thing is that no Firefox5 is required when Version 6 was working ok. Why is Firefox using a different driver than the one that is set in the printer object?
Not sure why FF used a different driver then the one set in the printer object except I had a few set. Eventually i used prndrv.exe to cleanup. I also tested removing the patch from bug#624699, landscape-printing-fixes which didn't help. The Cairo versions are the same, 1.6.4 so I assume the problem is in the patches.
With ff31, printing crashes here still, here's the meaningful part of the call stack I see in TRP:
EBP Address Module Obj:Offset Nearest Public Symbol
-------- --------- -------- ------------- -----------------------
Trap -> 1E94460C PMMERGE 0004:0011460C between CalculateBBox + 18 and GetCodePageObject - 74
0012F2B0 1E94BD10 PMMERGE 0004:0011BD10 between FinishSubpath + 9C and FillPath32 - E0
0012F2DC 1E8A4E52 PMMERGE 0004:00074E52 between CloseFigure32 + 186 and SavePath32 - EE
0012F31C 1FE41484 PMGPI 0003:00021484 between FullCloseFigure + 5C and FullCharStringPos - 2C
with this failing instruction:
_____________________________________________________________________
Exception C0000005 - Access Violation
_____________________________________________________________________
Process: D:\CODING\MOZILLA\MASTER-BUILD\DIST\BIN\FIREFOX.EXE
PID: E4EB (58603)
TID: 01 (1)
Priority: 200
Filename: C:\OS2\DLL\PMMERGE.DLL 04/10/2007 18:26:01 1,270,275
Address: 005B:1E94460C (0004:0011460C)
Cause: Attempted to read from 00000028
(not a valid address)
_____________________________________________________________________
Failing Instruction
_____________________________________________________________________
1E9445FE OR EDX, EDX (0bd2)
1E944600 JZ 0x1e94467d (0f84 77000000)
1E944606 MOV ECX, [EAX+0x1c] (8b48 1c)
1E944609 LEA EBX, [ECX+0x28] (8d59 28)
1E94460C >MOV EAX, [EBX] (8b03)
1E94460E MOV [EBP-0xc], EAX (8945 f4)
1E944611 MOV EAX, [EBX+0x4] (8b43 04)
1E944614 MOV [EBP-0x8], EAX (8945 f8)
And here is the console output BTW:
DIVE is disabled - Panorama's shadow-buffer is enabled
1436445330606 addons.update-checker WARN Update manifest for {972ce4c6-7e08-4474-a285-3208198ce6fd} did not contain an updates property
gfxOS2Surface for print - DC= 10000dc PS= 3415a50 w= 4676 h= 6784 preview= 0
gfxOS2Surface::GetPS - mSurfType= 3 mPS= 3415a50
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
~gfxOS2Surface for print - DC= 10000dc PS= 3415a50 w= 4676 h= 6784
gfxOS2Surface for print - DC= 10000dc PS= 3265a50 w= 2337 h= 3387 preview= 0
gfxOS2Surface::GetPS - mSurfType= 3 mPS= 3265a50
gfxOS2Surface::EndPage - mSurfType= 3
Creating E4EB_01.TRP
RWSCLI08: RwsExitListProc - termination code= 0
Using Firefox 24.3.0. The printer I am using is a generic printer using the LASERJET.DRV driver. As a testcase I am trying to print http://os2news.warpstock.org. Firefox crashes almost immediately. No TRP file is created, however this is in the popuplog:
07-02-2014 17:26:40 SYS3175 PID 0048 TID 0001 Slot 008c E:\FIREFOX\FIREFOX.EXE c0000005 1e8a2db3 P1=00000001 P2=1847409c P3=XXXXXXXX P4=XXXXXXXX
EAX=1847405c EBX=00000001 ECX=00000000 EDX=184a9450 ESI=00000001 EDI=184a9450
DS=0053 DSACC=d0f3 DSLIM=5fffffff
ES=0053 ESACC=d0f3 ESLIM=5fffffff
FS=150b FSACC=00f3 FSLIM=00000030 GS=0000 GSACC=\ GSLIM=** CS:EIP=005b:1e8a2db3 CSACC=d0df CSLIM=5fffffff SS:ESP=0053:0012fe14 SSACC=d0f3 SSLIM=5fffffff EBP=0012fe30 FLG=00010206
PMMERGE.DLL 0004:00072db3
It is interesting that this same crash occured when trying to print with version 10, however in that case a TRP file was created.