classilla / tenfourfox

Mozilla for Power Macintosh.
http://www.tenfourfox.com/
Other
273 stars 41 forks source link

Semaphore waits are killers on some sites [meta-threading performance problems] #193

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Spun off from issue 191. After patching this out, overhead drops a bit, but 
we're still spending ~40% of our time in semaphore lock on the affected sites. 
Shark looks like this:

    0.0%    95.2%   libSystem.B.dylib   _pthread_body   
    0.0%    90.5%   libnspr4.dylib   PR_Select  
    0.0%    61.9%   XUL   XRE_AddStaticComponent    
    0.0%    57.1%   XUL    0x3eef784 [176B] 
    0.0%    57.1%   XUL     XRE_AddStaticComponent  
    0.0%    47.6%   XUL      XRE_AddStaticComponent 
    0.0%    42.9%   libnspr4.dylib        PR_Wait   
    0.0%    42.9%   libnspr4.dylib         PR_WaitCondVar   
    0.0%    42.9%   libSystem.B.dylib           pthread_cond_wait   
    42.9%   42.9%   libSystem.B.dylib            semaphore_wait_signal_trap 
    0.0%    4.8%    libnspr4.dylib        PR_WaitCondVar    
    0.0%    4.8%    XUL      NS_CycleCollectorSuspect_P 
    0.0%    4.8%    XUL      0x299ec58 [788B]   
    0.0%    4.8%    libnspr4.dylib     PR_WaitCondVar   
    0.0%    9.5%    XUL   0x29b2afc [884B]  
    0.0%    4.8%    XUL   XRE_StartupTimelineRecord 
    0.0%    4.8%    XUL   js_GetScriptLineExtent(JSScript*) 
    0.0%    4.8%    XUL   js::IterateChunks(JSRuntime*, void*, void (*)(JSRuntime*, void*, js::gc::Chunk*)) 
    0.0%    4.8%    XUL   js::IndirectWrapper::toWrapper()  
    0.0%    4.8%    XUL  std::deque<FilePath, std::allocator<FilePath> >::_M_push_back_aux(FilePath const&) 
    0.0%    4.8%    firefox start   

Original issue reported on code.google.com by classi...@floodgap.com on 30 Nov 2012 at 6:56

GoogleCodeExporter commented 9 years ago
I don't have any multi CPU Mac so I can't tell whether to leave that enabled, 
although I'd leave that enabled.
My testing at least showed that although 10.5 performs much better than 10.4 in 
this situation, the underlying problem persists.

Apart from the iOS port of WebKit the other approaches (Chromium and WebKit2) 
use multiple processes instead of multiple threads in order to perform better 
on multicore CPUs, and the iOS WebKit people do already regret their solution 
(see the WebKit-Dev mailing list).

I did not notice any substantial performance difference between AuroraFox 20 
and TFF 22 in this particular test case - never the less jemalloc will improve 
general performance.

Original comment by Tobias.N...@gmail.com on 16 Jul 2013 at 10:26

GoogleCodeExporter commented 9 years ago
OK. On a related issue, what debugger are you using? I'm noticing it takes gdb 
a really long time to walk stack frames, even if I compile and use the later 
gdb-768 or whatever it was from Xcode 3.

Original comment by classi...@floodgap.com on 17 Jul 2013 at 1:00

GoogleCodeExporter commented 9 years ago
I never found any alternative debugger for OS X - and at least for Objective-C 
Apple's version of gdb is most probably the only choice. LLDB might be an 
alternative.

I wonder whether this problem is related to threaded speculative HTML parsing.
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/HTML5_Parser

Although Google implemented partially threaded parsing earlier this year in 
WebKit, apart from Chromium no port switched to that.

Original comment by Tobias.N...@gmail.com on 17 Jul 2013 at 9:11

GoogleCodeExporter commented 9 years ago
OK.

I'm in a jemalloc build of 22 (with issue 231, so it is minimally threaded). 
Much of the speed has returned on most of the bad sites. However 
www.butterflylabs.com, which was bad before, became hideously bad on the G5. I 
interrupted it in the debugger and the stack looks like, when it's pegging the 
CPU,

(gdb) bt 15
#0  0xffff9080 in ___memset_pattern () at 
/System/Library/Frameworks/System.framework/PrivateHeaders/ppc/cpu_capabilities.
h:193
#1  0x901296d0 in memset ()
#2  0x0002da34 in ozone_free (zone=0x3e100, ptr=0x609e00) at 
/Volumes/BruceDeuce/src/mozilla-22.1/memory/mozjemalloc/jemalloc.c:6612
#3  0x08da88bc in _cairo_image_surface_finish (abstract_surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-image-surface.c:7
34
#4  0x08dca7c0 in _moz_cairo_surface_finish (surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
#5  0x08dca910 in _moz_cairo_surface_destroy (surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
#6  0x08ddb73c in _cairo_quartz_surface_finish (abstract_surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-quartz-surface.c:
2053
#7  0x08dca7c0 in _moz_cairo_surface_finish (surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
#8  0x08dca910 in _moz_cairo_surface_destroy (surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
#9  0x08dbc2bc in _moz_cairo_pattern_destroy (pattern=0x321b5f40) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-pattern.c:346
#10 0x08da3a68 in _cairo_gstate_fini (gstate=0x1ccf996c) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:229
#11 0x08da3cec in _cairo_gstate_restore (gstate=<value temporarily unavailable, 
due to optimizations>, freelist=0x1ccf9ab8) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:290
#12 0x08d8fe94 in _moz_cairo_restore (cr=0x1ccf9800) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo.c:599
#13 0x06994e60 in mozilla::FrameLayerBuilder::DrawThebesLayer 
(aLayer=0x31160400, aContext=0x321b9700, aRegionToDraw=<value temporarily 
unavailable, due to optimizations>, aRegionToInvalidate=@0xefffae6c, 
aCallbackData=0xefffc928) at 
/Volumes/BruceDeuce/src/mozilla-22.1/layout/base/FrameLayerBuilder.cpp:3334
#14 0x08c6ff30 in mozilla::layers::BasicShadowableThebesLayer::PaintBuffer 
(this=0x31160400, aContext=<value temporarily unavailable, due to 
optimizations>, aRegionToDraw=@0xefffae3c, aExtendedRegionToDraw=@0xefffadd8, 
aRegionToInvalidate=@0xefffae6c, aDidSelfCopy=false, aCallback=<value 
temporarily unavailable, due to optimizations>, aCallbackData=0xefffc928) at 
basic/BasicThebesLayer.h:95
(More stack frames follow...)

This is being called through the overlay zone allocator and I suspect the 
memset at line 5358 in jemalloc.c -- maybe we just disable MALLOC_FILL, or only 
enable it for DEBUG builds since it's only useful for tracing memory.

However, on the other sites, the DEBUG version is already faster than the opt 
G5 version, so I'm not sure how to explain your results on 10.5 other than the 
fact the 10.5 allocator is already pretty optimized? I'm going to do a rebuild 
without MALLOC_FILL and with issue 231 turned off to see how that comes out. 
Note that the cut and paste and drag -to- the browser issues have solved 
themselves, but 10.4 is vulnerable to the crash in M702250 (see issue 218 for 
more).

Original comment by classi...@floodgap.com on 18 Jul 2013 at 12:39

GoogleCodeExporter commented 9 years ago
Disabling MALLOC_FILL fixes the specific problem above. However, with issue 231 
disabled and regular CPU detection enabled, the browser once again dramatically 
slows on certain sites even in a jemalloc build, so issue 218 and issue 231 
appear to be separate issues that contribute to this meta-issue.

The question then becomes whether 10.5 needs issue 231 also.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 2:31

GoogleCodeExporter commented 9 years ago
The crashes from M702250 used to be reproducible only temporarily (at least on 
10.5) - it could crash reproducibly at one moment but didn't crash anymore 
after rebooting.

Original comment by Tobias.N...@gmail.com on 18 Jul 2013 at 11:00

GoogleCodeExporter commented 9 years ago
It's consistent if I drag the image to the desktop and intermittent if I drag 
it to a non-image destination, say, a terminal window. But it's unacceptable at 
this level.

So, one option is don't let the user drag images -- either Copy to Clipboard 
(which appears to work) or save to disk (which works). I have to test those on 
big images, but I guess that's one way to solve the problem. Mozilla concluded 
the bug was Apple's, not theirs, after all.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 2:14

GoogleCodeExporter commented 9 years ago
Disabling image dragging at a high level fixes the problem, and still allows 
copying to clipboard or saving to disk, even for the large images in M702250. I 
consider this an acceptable compromise, so I'm going to build an opt of 22 with 
issue 218 and issue 231 tonight. If that looks good on the G5 and the iMac G4, 
we'll ship this as a beta.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 10:55

GoogleCodeExporter commented 9 years ago
Test version is available.

Original comment by classi...@floodgap.com on 22 Jul 2013 at 10:09

GoogleCodeExporter commented 9 years ago
This is much less impactful as of 31. The lower graphics overhead probably 
reduces contention quite a bit.

Original comment by classi...@floodgap.com on 26 Jul 2014 at 6:39

GoogleCodeExporter commented 9 years ago
Wontfixing especially since nothing really to carry forward.

Original comment by classi...@floodgap.com on 20 Mar 2015 at 2:51