classilla / tenfourfox

Mozilla for Power Macintosh.
http://www.tenfourfox.com/
Other
276 stars 41 forks source link

More native calls to block in tracejit for 4.0.2 #62

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Crash with stack corruption with tracejit enabled (without it, the problem goes 
away). However, I can only confirm this bug on the 7400. The 7450 does not 
exhibit it (even running the 7400 code), and the G5 does not exhibit it (even 
running 7400, 7450 or G5).

The stack is corrupted, implicating issue 37, but this may have a different 
impact because it only affects this class of machines. Per chtrusch it was also 
observed on G3. I can confirm it on our in-house 7400, but I don't have a G3 
running Tiger to test on. The crash occurs reliably when signing into Amazon; 
dump attached.

There is only one place this call appears, so the wallpaper fix is obvious:

LIns*
TraceRecorder::d2u(LIns* d)
{
    if (d->isImmD())
        return w.immi(js_DoubleToECMAUint32(d->immD()));

we simply abort the trace here. However, this could have a serious performance 
impact and I do not want to deploy this on unaffected machines. Definitely not 
shippable for 4.0.2, but this does demand a 4.0.3 to address it. Marking High 
rather than critical due to apparently limited impact. If a generic fix can't 
be found, we might build 4.0.3 with source adjustments for G3 and 7400 only.

Original issue reported on code.google.com by classi...@floodgap.com on 16 May 2011 at 12:58

Attachments:

GoogleCodeExporter commented 9 years ago
Changing summary to attract other reports.

Original comment by classi...@floodgap.com on 16 May 2011 at 1:04

GoogleCodeExporter commented 9 years ago
The more I look at this, the more I think the stack corruption is not a 
manifestation of issue 37. The call is for an immediate quantity only, and the 
crash is at the tracer level, not the nanojit, so this is occurring even before 
execution. The terminal crash is at fmod(). I wonder if this is a MacOS bug, 
but if so, why here and why these machines only?

I suppose the other option is to write an assembly version of d2u and inline it 
into jstracer.cpp but that seems rather drastic. We're pretty sure that the 
nanojit version works.

Original comment by classi...@floodgap.com on 16 May 2011 at 1:29

GoogleCodeExporter commented 9 years ago
Crash report Amazon Sign-in TFF4.0.2pre G3 10.4.11

Original comment by chtru...@web.de on 16 May 2011 at 2:08

Attachments:

GoogleCodeExporter commented 9 years ago
Crash report Amazon Sign-in TFF4.0.2pre G4 7450 10.5.8

Original comment by chtru...@web.de on 16 May 2011 at 2:09

Attachments:

GoogleCodeExporter commented 9 years ago
If the Sign-In page has already successfully loaded once, e.g. by loading it 
with JS disabled or nanojit disabled, the crash doesn't *always* occur the next 
time you try (because it is in the cache?). The crash always occurs with a 
fresh user profile for TFF (trash ~/Library/Application Spport/Firefox). 
Sometimes the crash occurs directly after clicking "Sign in", sometimes the 
browser stalls for 10-20 seconds before crashing, sometimes you see (parts of) 
the Sign-In page (the Name & Password mask), sometimes you don't.

Original comment by chtru...@web.de on 16 May 2011 at 2:25

GoogleCodeExporter commented 9 years ago
That's a totally different crash signature in all three reports. This is a real 
mess. This might be issue 37 after all.

I don't think I can do anything else other than block the trace at this point, 
but I really don't want to do this because it's not immediately obvious where 
the actual failure exists. I don't think the cache has to do with this, but you 
could try Shift+Reload to see (that bypasses the cache). It's not cookies, 
because I deleted them all just to see if it was coming from a cookie set 
routine, and it's not HTML Local Storage. When I get back from the office, I'll 
start in the Profile Manager and see if I can trigger it on the G5.

Original comment by classi...@floodgap.com on 16 May 2011 at 3:23

GoogleCodeExporter commented 9 years ago
Looking through the stack traces, these are all worst-case native calls. I 
could block at those points too as a stopgap. Are those crash signatures 
stable, i.e., when you crash, do you *always* crash with that signature on 
those machines? We could RETURN_STOP and maybe save the generated trace so far.

Original comment by classi...@floodgap.com on 16 May 2011 at 3:33

GoogleCodeExporter commented 9 years ago
I get different signatures, but there seems to be a limited set of 3-4 for each 
processor. Attached results from crash tests. Once more I could also confirm 
that TFF crashes every time with a clean user account.

Original comment by chtru...@web.de on 16 May 2011 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks, that's very helpful and yes, while the crash point varies, it is still 
all within the same set of basic functions.

I still suspect issue 37 but I really don't want to make a massive low-level 
fix for 4.0. So I'm going to block the trace in those functions and run off a 
sample build. If performance is not too bad, then I'll work it into 4.0.2 final 
since this should be a very safe fix.

Original comment by classi...@floodgap.com on 16 May 2011 at 4:37

GoogleCodeExporter commented 9 years ago
Changing summary to remove architecture dependence

Original comment by classi...@floodgap.com on 16 May 2011 at 4:38

GoogleCodeExporter commented 9 years ago
I'VE GOT IT! Follow up in issue 37!

Original comment by classi...@floodgap.com on 17 May 2011 at 4:38