bitwiseworks / mozilla-os2

Mozilla for OS/2 and OS/2-based systems
Other
34 stars 9 forks source link

Printing causes crash #75

Closed dazarewicz closed 7 years ago

dazarewicz commented 10 years ago

Using Firefox 24.3.0. The printer I am using is a generic printer using the LASERJET.DRV driver. As a testcase I am trying to print http://os2news.warpstock.org. Firefox crashes almost immediately. No TRP file is created, however this is in the popuplog:

07-02-2014 17:26:40 SYS3175 PID 0048 TID 0001 Slot 008c E:\FIREFOX\FIREFOX.EXE c0000005 1e8a2db3 P1=00000001 P2=1847409c P3=XXXXXXXX P4=XXXXXXXX
EAX=1847405c EBX=00000001 ECX=00000000 EDX=184a9450 ESI=00000001 EDI=184a9450
DS=0053 DSACC=d0f3 DSLIM=5fffffff
ES=0053 ESACC=d0f3 ESLIM=5fffffff
FS=150b FSACC=00f3 FSLIM=00000030 GS=0000 GSACC=\ GSLIM=** CS:EIP=005b:1e8a2db3 CSACC=d0df CSLIM=5fffffff SS:ESP=0053:0012fe14 SSACC=d0f3 SSLIM=5fffffff EBP=0012fe30 FLG=00010206

PMMERGE.DLL 0004:00072db3

It is interesting that this same crash occured when trying to print with version 10, however in that case a TRP file was created.

wztest commented 10 years ago

Printing here with the omni driver did not crash the browser but created a 100% CPU usage when starting to print the document. Killing the browser did not help to fix the CPU load, but killing pmview did the job.

SilvanScherrer commented 10 years ago

@wztest what does pmview have to do with firefox?

dazarewicz commented 10 years ago

I tested version 24.8.1 on 7 systems. Of the 7 systems tested, one seems to print OK, the other 6 do not. When attempting to print, FF quietly exits with no trap file and no popuplog.

I was able to capture a process dump for some of the failures. The process dump appears to show that FF has decided to exit (for whatever reason) and encounters an error while exiting.

The next step would be to figure out why FF has chosen to exit.

dazarewicz commented 10 years ago

Additional information: All of my systems print using the LASERJET.DRV printer driver since that is the kind of printers I have. I also tested FF using the OMNI and PSCRIPT drivers. The OMNI driver has the same result in that FF quietly exits. With limited testing on one system, FF seems to be able to print using the PSCRIPT driver.

dmik commented 10 years ago

Thanks for the info, David. I need to perform my own tests to see what is going on and how much of this I can reproduce. I will let you know how it goes on.

dspiatkowski commented 10 years ago

24.8.1 B2, printing a single page from the above URL to my PSCRIPT (30.800) works successfully.

dspiatkowski commented 10 years ago

24.8.1 B2, printing a single page from the above URL to my LASERJET (30.792) works successfully. FYI, same printer as above, but using the LASERJET emulation...

wztest commented 9 years ago

Latest Firefox Beta2 release with the OMNI printer driver quits also when trying to print the page, and also nothing in popuplog.

dazarewicz commented 9 years ago

With the new 24.8.1 beta 3 I now get a trap file. The trap is the same as I have seen in the debugger with the previous version. I don't see any way to attach a file to this ticket so here is a link: http://88watts.net/x/008C_01.TRP

DavidMcKenna commented 9 years ago

FWIW - I had this problem (crash when trying to print) with my OMNI printer. Printing worked with PSCRIPT via CUPS to the same printer. I was able to repair by deleting the OMNI printer object AND the OMNI print driver object (did not delete driver files), then reinstalling the driver and created new printer object.

SilvanScherrer commented 9 years ago

@dazarewicz did you eventually try what DavidMcKenna suggested? Not that we search a ghost.

dazarewicz commented 9 years ago

You mean deleting and recreating the printer object? That didn't make any sense since I had just created the printer object. However, I did this today...

Create printer object using LASERJET. Try printing -> crash. Delete printer object. Reboot Create printer object using LASERJET Try printing -> crash.

dazarewicz commented 9 years ago

Why did this just close itself?

SilvanScherrer commented 9 years ago

David thx for testing. Btw did Firefox 17 print? Or which was the last Firefox printing ok? Eventually we find something with that information faster.

dazarewicz commented 9 years ago

I still had version 10.0.12 installed. It trapped when printing: http://88watts.net/x/004C_01.TRP I'll find 17 and try it.

dazarewicz commented 9 years ago

I tried FF 17.0.5 and it won't even run long enough to load a web page.

dmik commented 9 years ago

I tried installing a test printer HP LaserJet 6L that outputs to a file (I don't have any real laser printer around) using the LASERJET driver version 30.827. All works smoothly here. However, if I set up an Omni printer (e.g. Epson ActionLaser 1600 in PCL5, same driver version), I get a similar crash:

https://gist.github.com/dmik/dd62f5ca41e6fc901a14

David, could you try this driver version too to see if it makes any difference for you?

dmik commented 9 years ago

Another question: where do you get things like pmgpi.sym?

dryeo commented 9 years ago

On 02/24/15 02:13 PM, Dmitriy Kuminov wrote:

Another question: where do you get things like |pmgpi.sym|?

On the eCS2.2b2 ISO it is in \os2image\sym_1\os2\dll along with most other sym files. Might be worth testing FF4 as well. The Cairo backend didn't change between 4 and 10 and I believe even 24. Dave

dmik commented 9 years ago

Right, I knew it's somewhere in the ISO. Corrected my log. It doesn't shed more light though.

dazarewicz commented 9 years ago

LASERJET.DRV version 30.827 is the version I am already using.

Test by simply setting the printer queue to "hold" instead of "release". This allows the app to print normally, but the queue never actually sends anything to the printer. No need to print to a file.

I will have to dig around to find version 4. That was a long time ago.

dryeo commented 9 years ago

On 02/24/15 08:53 PM, dazarewicz wrote:

I will have to dig around to find version 4. That was a long time ago.

They should be on ftp.netlabs.org/incoming/mozilla or possibly ftp.netlabs.org/mozilla.

dazarewicz commented 9 years ago

I couldn't find any version 4. version 3.0.7 only prints to a pdf so N/A version 3.5.8 only prints to a pdf so N/A version 3.6.4 only prints to a pdf so N/A version 8.0.1 crashes when printing version 9.0.1 crashes when printing version 10.0.2 crashes when printing All test were run with a fresh install of FF into an empty directory and after deleting the profile directory and with no plugins. FF was run on a system with the current libc/gcc DLLs, not the older DLLs that were available at the time.

dazarewicz commented 9 years ago

I went through my old backup disks and found some old versions. version 1.5.0.1 Prints fine. No problems. version 2.0.0.9 Prints fine. No problems. version 2.0.0.12 Prints fine. No problems. version 3.0 only prints to a pdf so N/A

dryeo commented 9 years ago

On 02/24/15 10:16 PM, dazarewicz wrote:

I couldn't find any version 4.

Now that I think of it, I believe Walter uploaded it to Mozilla's ftp site. FF4 was the first version with the Cairo print surface so no point testing older, prior to FF3 we didn't use Cairo and used the normal OS/2 printing mechanism. Prior to FF 10.0.12 the only dependency was libc. IIRC 10.0.2 and earlier 10.0x were built with the wrong widget code so the problem is likely in the gfx code. I suspect there are still problems in the cairo code judging by some of the trp reports I've seen but I'm not knowledgeable enough to say for sure. I did some testing but couldn't crash any version. Start firefox with -profilemanager to create a test profile rather then deleting your profile

dmik commented 9 years ago

After some printf work I see that it always crashes inside a call to GpiSavePS in _cairo_os2_printing_surface_show_glyphs() in cairo-os2-printing-surface.c. Each of these GpiSavePS calls is paired by the subsequent GpiReleasePS call so it's surely not an overflow due to deep nesting. Playing around to see what could influence that. May be some specific font makes it crash as on simpler pages (like the http://ya.ru) it doesn't crash (I'm using the David's example to screw it).

David, can you check if printing http://ya.ru works for you?

dazarewicz commented 9 years ago

That page prints OK, however there really is nothing on it. Just a couple of images.

dmik commented 9 years ago

This confirms that the problem is somehow related to the complexity of the printed page to the extent of the amount of text, most likely font-related. Somehow, drawing a lot of font glyphs for printing through GPI screws up PM.

dmik commented 9 years ago

Some more info. It crashes after printing 40-60 glyphs, each time on a different one. I can't find any consistent pattern so far. Patches by Rich to cairo seem to make it use GPI primitives to draw font glyphs on a HPS. My guess is that some GPI/PM limit is hit when doing this and it screws up PM. This also may be related to some matrix transformation.

I don't expect that it will be easy to find where exactly cairo enters the unsafe zone and, moreover, it's not guaranteed that it can be worked around w/o losing functionality... GPI is a VERY old part of OS/2 and perhaps one of the most bogus ones. All modern apps (e.g. Qt) do all 2d drawing on their own and only use raster features of GPI. That may explain why we don't see problems like that with other apps.

Also, printing in modern apps is done through CUPS (which uses PS which is sent directly to the printer), so no GPI primitives involved as well. And CUPS printing works great from Firefox too.

dryeo commented 9 years ago

Reviewing old trp reports I found this,

I can print from FF v. 4.0.1 but not from v. 6.0 and 4.0.2pre. I get exception reports from v. 6.0 and both versions of 4.0.2pre, and they all points to PMMERGE.DLL. The reports are attached.

With a similar trp report to David's though it includes ft2lib

Exception Report - created 2011/07/25 22:30:07


Firefox v4.0.2pre - build 20110702073850

OS2/eCS Version: 2.45

of Processors: 1

Physical Memory: 3071 mb Virt Addr Limit: 1536 mb Exceptq Version: 7.10 (Mar 1 2011)


Exception C0000005 - Access Violation


Process: D:\DOWNLOAD\FIREFOX\FIREFOX-20110702\FIREFOX-20110702\FIREFOX.EXE PID: 5B (91) TID: 01 (1) Priority: 200

Filename: C:\OS2\DLL\PMMERGE.DLL Address: 005B:1E8A2E56 (0004:00072E56) Cause: Attempted to read from 05620562 (not a valid address)


Failing Instruction


1E8A2E4D MOV EDI, [EBP-0x10](8b7d f0) 1E8A2E50 OR EDI, EDI (0bff) 1E8A2E52 JZ 0x1e8a2e61 (74 0d) 1E8A2E54 MOV EAX, EDI (8bc7) 1E8A2E56 >MOV EDI, EDI 1E8A2E58 CALL 0x1e86bd30 (e8 d38efcff) 1E8A2E5D OR EDI, EDI (0bff) 1E8A2E5F JNZ 0x1e8a2e54 (75 f3)


Registers


EAX : 05620562 EBX : FFFFFFFF ECX : 0001005B EDX : 13E8B240 ESI : 0200002F EDI : 05620562 ESP : 0012F084 EBP : 0012F0A0 EIP : 1E8A2E56 EFLG : 00210202 CS : 005B CSLIM: FFFFFFFF SS : 0053 SSLIM: FFFFFFFF DS : 0053 ES : 0053 FS : 150B GS : 0000

EAX : not a valid address EBX : not a valid address ECX : read/exec memory at 0001:0000005B in FIREFOX EDX : read/write memory at 000C:0000B240 in PMMERGE ESI : uncommitted memory allocated by LIBC063 EDI : not a valid address


Stack Info for Thread 01


Size Base ESP Max Top 00100000 00130000 -> 0012F084 -> 0011B000 -> 00030000


Call Stack


EBP Address Module Obj:Offset Nearest Public Symbol


Trap -> 1E8A2E56 PMMERGE 0004:00072E56

0012F0A0 1E8A2DEE PMMERGE 0004:00072DEE

0012F0C4 1E8A58E3 PMMERGE 0004:000758E3

140218FC 00000001 Invalid address: 00000001


Labels on the Stack


ESP Address Module Obj:Offset Nearest Public Symbol


0012F0A4 1E8A2DEE PMMERGE 0004:00072DEE 0012F0C8 1E8A58E3 PMMERGE 0004:000758E3 0012F0D4 1E86C3FA PMMERGE 0004:0003C3FA 0012F110 1FE844C4 PMGPI 0003:000244C4 0012F128 1FE849C0 PMGPI 0003:000249C0 0012F130 1FE40208 PMGPI 0001:00000208 0012F134 00010003 FIREFOX 0001:00000003 between text + 3 and _main - 141 (in {standard input} and nsBrowserApp.o) 0012F138 0001000A FIREFOX 0001:0000000A between __text + A and _main - 13A (in {standard input} and nsBrowserApp.o) 0012F144 1FE61396 PMGPI 0003:00001396 0012F154 1FE47D54 PMGPI 0001:00007D54 0012F160 1FE47D8B PMGPI 0001:00007D8B 0012F164 00010003 FIREFOX 0001:00000003 between text + 3 and _main - 141 (in {standard input} and nsBrowserApp.o) 0012F168 0001000A FIREFOX 0001:0000000A between __text + A and _main - 13A (in {standard input} and nsBrowserApp.o) 0012F174 1FE61396 PMGPI 0003:00001396 0012F188 1FE6E8E0 PMGPI 0003:0000E8E0 0012F18C 1FE6E81C PMGPI 0003:0000E81C 0012F190 1FE6E940 PMGPI 0003:0000E940 0012F194 1FE6E904 PMGPI 0003:0000E904 0012F1A4 1FE8459E PMGPI 0003:0002459E 0012F1B8 1FE849C0 PMGPI 0003:000249C0 0012F1C0 1FE90053 PMGPI 0003:00030053 0012F1CC 1FFC176E DOSCALL1 0002:0000176E 0012F1FC 1FFC12AA DOSCALL1 0002:000012AA 0012F210 1F031181 FT2LIB 0001:00011181 ...

firefox 4.0.1 should be retested

dryeo commented 9 years ago

Can't load github to edit my last message so to follow up, There's a few similar FF6 printer crashes, both omni and laserjet drivers. One user suggested setting Printer -specific_format under queue options (HP Laserjet 4L) as a workaround. Firefox 4.0.1 was distributed with eCS 2.1, and is on the iso. If using 4.0.1 places.sqlite will be marked corrupt, just move places.sqlite3.corrupt back to places.sqlite3 after testing to fix. Probably unrelated but printing to PDF and PS also broke between ff4 and ff6 with the output becoming garbled like a buffer was overrun.

dryeo commented 9 years ago

OK, recreated the HP Laserjet 6l and reproduced the crash with my build of Firefox 24.8.1 and printing http://os2news.warpstock.org/ changing the setting as described above didn't help. Testing with FF4.0.1 does not crash so this is a regression between FF4 and FF6. We never had a build of FF5 as right after FF4 they removed the disable IPC code path and Walter patched FF6. Should I try to find the regression by bisecting the build? FF4 built in under an hour but applying the patches...

dmik commented 9 years ago

Dave, you are doing an important thing here. Of course finding the exact change would help greatly.

dryeo commented 9 years ago

I guess the first thing is to review the changes between the patchsets. If anyone is interested I've uploaded Rich's original v4 patches and Walters v6 patches to netlabs incoming/mozilla. May just be a simple rebasing error. ff40-patches.zip ff6patches.zip

dryeo commented 9 years ago

Mozilla just closed https://bugzilla.mozilla.org/show_bug.cgi?id=682952 (FF 6 crashes for most of the pages I try to print). It has some trp reports and rehashes this issue. Rich seems to have come to the conclusion the problem was in the Presentation Manager. We were held back by no debug build back then. Still strange that FF4 works. My build of FF4 has the 100%CPU with no connection issue. This happened with libc064 (first release) and IIRC GCC 4.5.1 as well.

SilvanScherrer commented 9 years ago

That it might be a PM problem, is what we also think. David, did you try printer specific format, like mentioned in the above mozilla ticket?

dazarewicz commented 9 years ago

It's interesting that you say it might be a PM problem. If that were the case, I would expect to see problems in other applications but I do not. FF is the only app with problems. PMMail, PMView, Mesa, Describe, PMFax, Lucide, Acrobat, editors, and all other PM programs print just fine.

Setting Printer specific format has no effect. It still crashes with this set.

dmik commented 9 years ago

David, do you know which of the applications you listed use GPI primitives to rasterize vector fonts? Perhaps, none. I guess that even those that print out text in vector format use GpiCharString and friends. I think Netscape/Mozilla used it as well back then (in pre 4.x era) and that worked.

dazarewicz commented 9 years ago

I'm just being devil's advocate. From a user point of view, the user says, "I have all these dozens of apps that all print OK, and one app that doesn't. Something must be broken in that one app". The user doesn't care about which function the app uses, they just see it as the app is broken. This is especially true when it used to work, then a new version came out and it was broken.

From a developer point of view, if you truly believe that the primitive you are using is broken, why do you insist on using it when obviously there is another way that works? Alternatively, perhaps the primitive is not really broken, perhaps it is being used incorrectly (or not as intended). I don't really know. I'm just offering some alternate points of view. I haven't looked at the code and it has been years since I have worked on this type of code in a PM app and I don't remember which functions I used.

dmik commented 9 years ago

Being a devil's advocate, you are of course right :) But that's another story. We have a workaround for that scenario that works really well ("Use CUPS").

What about the dev POV, the whole purpose to switch font rendering from GPI to cairo was to significantly improve output quality, IIRC: it is known that GPI sucks in that area function wise (it doesn't have good support for TrueType and a dozen of other things). This switch required to use GpiMove/GpiLine directly to draw font glyphs. Somehow when doing this we push some GPI limits and it crashes. Yes, it is possible that we do something not as GPI expects but in my opinion it's still a failure of PM — a good API doesn't crash, regardless of the input. And, more over, there may be some workaround that makes it not crash but still give somewhat acceptable results but as I already said it looks non-trivial to find it ATM. I will continue looking though.

ydario commented 9 years ago

It would be interesting to count how many primitives are called before crashing... maybe we hit some magic number like 32K or 64K or ...

dryeo commented 9 years ago

OK, I got FFv4 built and working (problem was bsd_select in nsprpub) with the ffv4 patches. Printing seemed to work fine. I then rebased the printing patches from FFv6 and rebuilt, printing still worked fine. So it seems it is a regression in the Mozilla code rather then in Rich's patches, which did evolve quite a bit. I'll see if I can get FFv5 built next though that is going to mean rebasing the IPC patches :(

dmik commented 9 years ago

@dryeo how do you come to the conclusion it's Mozilla? Do you want to say that you have a build of Firefox where printing works with the current set of cairo patches from Rich? Which build is that then? (exactly, in terms of mercurial commits).

wztest commented 9 years ago

@dryeo You were also testing the OMNI driver?

dryeo commented 9 years ago

On 03/06/15 03:00 AM, Dmitriy Kuminov wrote:

@dryeo https://github.com/dryeo how do you come to the conclusion it's Mozilla? Do you want to say that you have a build of Firefox where printing works with the current set of cairo patches from Rich? Which build is that then? (exactly, in terms of mercurial commits).

What I did was rebuild FFv4.01 with the original ffv4 patches + rm_tcpip40hdrs, KOMH's fix_no_connection and disabled bsd-select (needed to actually connect and fix 100% CPU). Printing worked. I then replaced the cairo and printing patches with the ones from FFv6 (about +-8 revisions ahead according to Rich's naming), removed the PORTRAIT ? LANDSCAPE part as it wasn't implemented yet and printing still worked. I then built FFv6 Aurora (revision cf0a29826586) with the FFv6 patches and printing crashed in PMMERGE, see https://gist.github.com/dryeo/9d949391db1bac128355 for trp and log. Note, I created both the omni and laserjet printer drivers referred to in this issue and used a new profile, something was still weird as the FFv4 test only allowed me to pick the laserjet and the FFv6 only allowed the omni driver and according to the log actually used the null driver. I'm currently rebuilding FFv4 and will retest with only one driver at a time installed. I also want to do some intermediate builds but have had a shortage of time. The only real change I made to adapt the FFv6 patches to FFv4 was this quick hack, so perhaps the addition of Landscape vs portrait had something to do with the crash.

diff --git a/widget/src/os2/nsDeviceContextSpecOS2.cpp b/widget/src/os2/nsDeviceContextSpecOS2.cpp --- a/widget/src/os2/nsDeviceContextSpecOS2.cpp +++ b/widget/src/os2/nsDeviceContextSpecOS2.cpp @@ -553,19 +553,20 @@ mPrintSettings->GetPaperWidth(&width); mPrintSettings->GetPaperHeight(&height); width = POINTS_PER_INCH_FLOAT; height = POINTS_PER_INCH_FLOAT;

 PRInt32 orientation;
 mPrintSettings->GetOrientation(&orientation);
dryeo commented 9 years ago

OK retested with both the omni (epson) and laserjet drivers referred to in this issue. Couldn't remember or find the revision I used last for FFv4 so used GECKO_2_1_BASE which gave me FFv4B13pre. Same results with FFv6 crashing (I have the trps and logs). These are built with the same environment as FFv10.0.12 (GCC 4.4.6) except libc066 now.

wztest commented 9 years ago

This is getting really complicated. A 'new' build of Firefox6 works with old patches applied but fails when using new Firefox 6+ patches. The good thing is that no Firefox5 is required when Version 6 was working ok. Why is Firefox using a different driver than the one that is set in the printer object?

dryeo commented 9 years ago

Not sure why FF used a different driver then the one set in the printer object except I had a few set. Eventually i used prndrv.exe to cleanup. I also tested removing the patch from bug#624699, landscape-printing-fixes which didn't help. The Cairo versions are the same, 1.6.4 so I assume the problem is in the patches.

dmik commented 9 years ago

With ff31, printing crashes here still, here's the meaningful part of the call stack I see in TRP:

   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  1E94460C   PMMERGE   0004:0011460C  between CalculateBBox + 18 and GetCodePageObject - 74

 0012F2B0  1E94BD10   PMMERGE   0004:0011BD10  between FinishSubpath + 9C and FillPath32 - E0

 0012F2DC  1E8A4E52   PMMERGE   0004:00074E52  between CloseFigure32 + 186 and SavePath32 - EE

 0012F31C  1FE41484   PMGPI     0003:00021484  between FullCloseFigure + 5C and FullCharStringPos - 2C

with this failing instruction:

_____________________________________________________________________

Exception C0000005 - Access Violation
_____________________________________________________________________

Process:  D:\CODING\MOZILLA\MASTER-BUILD\DIST\BIN\FIREFOX.EXE
PID:      E4EB (58603)
TID:      01 (1)
Priority: 200

Filename: C:\OS2\DLL\PMMERGE.DLL 04/10/2007 18:26:01 1,270,275
Address:  005B:1E94460C (0004:0011460C)
Cause:    Attempted to read from 00000028
          (not a valid address)

_____________________________________________________________________

Failing Instruction
_____________________________________________________________________

1E9445FE  OR  EDX, EDX         (0bd2)
1E944600  JZ  0x1e94467d       (0f84 77000000)
1E944606  MOV ECX, [EAX+0x1c]  (8b48 1c)
1E944609  LEA EBX, [ECX+0x28]  (8d59 28)
1E94460C >MOV EAX, [EBX]       (8b03)
1E94460E  MOV [EBP-0xc], EAX   (8945 f4)
1E944611  MOV EAX, [EBX+0x4]   (8b43 04)
1E944614  MOV [EBP-0x8], EAX   (8945 f8)
dmik commented 9 years ago

And here is the console output BTW:

DIVE is disabled - Panorama's shadow-buffer is enabled
1436445330606   addons.update-checker   WARN    Update manifest for {972ce4c6-7e08-4474-a285-3208198ce6fd} did not contain an updates property
gfxOS2Surface for print  - DC= 10000dc PS= 3415a50 w= 4676 h= 6784 preview= 0
gfxOS2Surface::GetPS - mSurfType= 3  mPS= 3415a50
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
gfxOS2Surface::EndPage - mSurfType= 3
~gfxOS2Surface for print - DC= 10000dc PS= 3415a50 w= 4676 h= 6784
gfxOS2Surface for print  - DC= 10000dc PS= 3265a50 w= 2337 h= 3387 preview= 0
gfxOS2Surface::GetPS - mSurfType= 3  mPS= 3265a50
gfxOS2Surface::EndPage - mSurfType= 3
Creating E4EB_01.TRP
RWSCLI08: RwsExitListProc - termination code= 0