Enable jemalloc on 10.4, fix problems

GoogleCodeExporter commented 9 years ago

Although I don't know if this is a contributor to issue 193, it sure seems like 
a prerequisite.

Original issue reported on code.google.com by classi...@floodgap.com on 13 Apr 2013 at 12:15

Blocking: #193

GoogleCodeExporter commented 9 years ago

Comparing 
http://www.opensource.apple.com/source/Libc/Libc-391.2.10/include/malloc/malloc.
h and 
http://www.opensource.apple.com/source/Libc/Libc-763.12/include/malloc/malloc.h 
, we should be able to use the _leopard_malloc_zone for 10.4 as well.

Original comment by classi...@floodgap.com on 13 Apr 2013 at 12:15

GoogleCodeExporter commented 9 years ago

Original comment by classi...@floodgap.com on 13 Apr 2013 at 12:16

Now blocking: #193

GoogleCodeExporter commented 9 years ago

Tobias, how are you enabling mozalloc? The changesets show your change to 
jemalloc.c, but that doesn't even get compiled here, and I can't see where 
you're twiddling MOZ_MEMORY (unless you're changing it through some other 
means).

Original comment by classi...@floodgap.com on 15 Apr 2013 at 2:44

GoogleCodeExporter commented 9 years ago

(and what about https://bugzilla.mozilla.org/show_bug.cgi?id=702250 ?)

Original comment by classi...@floodgap.com on 15 Apr 2013 at 2:46

GoogleCodeExporter commented 9 years ago

Fixed MOZ_MEMORY and short-circuited memalign which doesn't exist on 10.4 
(instead it uses the built-in ipalloc). Confirmed osx_use_jemalloc is true by 
writing in if (osx_use_jemalloc) __asm__("trap\n") and verifying JS traps there 
(it does). V8 passes with jemalloc. Doesn't make a lot of difference there, but 
we'll build the full browser and then see if I can trigger M702250.

Original comment by classi...@floodgap.com on 15 Apr 2013 at 3:58

GoogleCodeExporter commented 9 years ago

Also had to make a tweak in storage/ because 10.4's zones don't have 
malloc_good_zone in them (changed it to simply use 16-byte-aligned stores).

Incredibly the browser not only works, but is (especially for DEBUG) very 
quick. Many of the sites in issue 193 are better, implying that resource 
starvation may be part of the reason for all those threads sitting around in 
wait.

However, drag and drop (thus can't test M702250), cut and paste, and printing 
are all messed up. Printing may be revenge of issue 82. I'm trying to figure 
out what's wrong with the other two. Running a non-jemalloc build tonight to 
try to exonerate it as the cause. I suspected M647216 but trying to work around 
it just made things worse.

Original comment by classi...@floodgap.com on 16 Apr 2013 at 5:19

GoogleCodeExporter commented 9 years ago

Printing appears to be issue 82, but cut and paste and drag and drop all work 
in a non-jemalloc build. Even if osx_use_jemalloc is set to false (using the 
environment variable NO_MAC_JEMALLOC), it does not work. I'm suspicious 
alignment is to blame, but IonMonkey has higher priority right now. Dropping 
milestone, though we do want this.

Original comment by classi...@floodgap.com on 16 Apr 2013 at 1:39

Changed title: Enable jemalloc on 10.4, fix widget problems
Added labels: Priority-High, Type-Enhancement
Removed labels: Milestone-Unstable, Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=691003 makes me think our minimum 
size should be 16 bytes, always, even for tiny allocations.

Original comment by classi...@floodgap.com on 16 Apr 2013 at 10:59

GoogleCodeExporter commented 9 years ago

... i.e. change #define TINY_MIN_2POW           (sizeof(void*) == 8 ? 3 : 2) to 
4 (forcing 2^4 minimum allocations).

Original comment by classi...@floodgap.com on 16 Apr 2013 at 11:05

GoogleCodeExporter commented 9 years ago

That didn't fix it.

Original comment by classi...@floodgap.com on 18 Apr 2013 at 1:23

GoogleCodeExporter commented 9 years ago

I was able to proove that AuroraFox uses jemalloc by setting breakpoints in 
some of the functions of jemalloc. The patches in the Changesets are exactly 
what is needed to get it working (actually it wasn't correctly enabled for some 
versions but that was fixed) - BUT at least at one moment I could reproducibly 
crash the browser using drag and drop, however it occurs very rarely.

Original comment by Tobias.N...@gmail.com on 19 Apr 2013 at 1:51

GoogleCodeExporter commented 9 years ago

But how did you set MOZ_MEMORY to 1? It's not in your Aurora changesets.

Original comment by classi...@floodgap.com on 19 Apr 2013 at 4:28

GoogleCodeExporter commented 9 years ago

Adding "--enable-jemalloc" to the configure command line should override the 
default.

Original comment by Tobias.N...@gmail.com on 21 Apr 2013 at 12:39

GoogleCodeExporter commented 9 years ago

See http://hg.mozilla.org/mozilla-central/file/4420f27742c7/configure.in#l7112

Original comment by Tobias.N...@gmail.com on 21 Apr 2013 at 12:39

GoogleCodeExporter commented 9 years ago

OK.

Looking through jemalloc.c, it appears that the original szone is still able to 
be called for certain things. Maybe that's what we need to add.

Original comment by classi...@floodgap.com on 22 Apr 2013 at 4:50

GoogleCodeExporter commented 9 years ago

I'm thinking we have to fake up a memalign for the original zone, which 10.4 
does not have. Something like this might work:

void *aligned_malloc( size_t size, int align )
{
    void *mem = (szone->malloc)( size + (align-1) + sizeof(void*) );

    char *amem = ((char*)mem) + sizeof(void*);
    amem += align - ((uintptr)amem & (align - 1));

    /* ((void**)amem)[-1] = mem; // for an aligned free */
    return amem;
}

The trick is making sure we don't leak for alignments != 16. I'm throwing a 
trap into jemalloc and we'll see if it bombs at memalign. If so, I bet that's 
our problem.

Original comment by classi...@floodgap.com on 15 Jul 2013 at 5:26

GoogleCodeExporter commented 9 years ago

Status update:

- Typing this in a jemalloc debug build of "22.1." Browser speed did increase 
in the way expected, as it did before, but note I am doing this with issue 231 
also applied. I need to see if I can get rid of that.
- I tinkered with the jemalloc patch for 10.4 and simply made memalign a no-op 
with a trap if osx_use_jemalloc id false, but the trap isn't getting triggered, 
so it may never have been called in the first place.
- I'm not sure what changed in widget between my last cut and 22, but cut and 
paste and drag *to* the browser work now. However, dragging things *out* of the 
browser causes M702250. That's bad. OTOH, it appears that it may be possible to 
get around it with something along the lines of M703087 
(https://bug703087.bugzilla.mozilla.org/attachment.cgi?id=575304) which may not 
be good, but may make a hopefully infrequent operation possible. Or, we just 
disable drag and drop of images *from* the browser (copying or 
right-click-and-save do work).

Original comment by classi...@floodgap.com on 18 Jul 2013 at 12:20

GoogleCodeExporter commented 9 years ago

Large allocations are also very slow. On www.butterflylabs.com,

(gdb) bt 15
#0  0xffff9080 in ___memset_pattern () at 
/System/Library/Frameworks/System.framework/PrivateHeaders/ppc/cpu_capabilities.
h:193
#1  0x901296d0 in memset ()
#2  0x0002da34 in ozone_free (zone=0x3e100, ptr=0x609e00) at 
/Volumes/BruceDeuce/src/mozilla-22.1/memory/mozjemalloc/jemalloc.c:6612
#3  0x08da88bc in _cairo_image_surface_finish (abstract_surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-image-surface.c:7
34
#4  0x08dca7c0 in _moz_cairo_surface_finish (surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
#5  0x08dca910 in _moz_cairo_surface_destroy (surface=0x31a1fbc0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
#6  0x08ddb73c in _cairo_quartz_surface_finish (abstract_surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-quartz-surface.c:
2053
#7  0x08dca7c0 in _moz_cairo_surface_finish (surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
#8  0x08dca910 in _moz_cairo_surface_destroy (surface=0x338e83a0) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
#9  0x08dbc2bc in _moz_cairo_pattern_destroy (pattern=0x321b5f40) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-pattern.c:346
#10 0x08da3a68 in _cairo_gstate_fini (gstate=0x1ccf996c) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:229
#11 0x08da3cec in _cairo_gstate_restore (gstate=<value temporarily unavailable, 
due to optimizations>, freelist=0x1ccf9ab8) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:290
#12 0x08d8fe94 in _moz_cairo_restore (cr=0x1ccf9800) at 
/Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo.c:599
#13 0x06994e60 in mozilla::FrameLayerBuilder::DrawThebesLayer 
(aLayer=0x31160400, aContext=0x321b9700, aRegionToDraw=<value temporarily 
unavailable, due to optimizations>, aRegionToInvalidate=@0xefffae6c, 
aCallbackData=0xefffc928) at 
/Volumes/BruceDeuce/src/mozilla-22.1/layout/base/FrameLayerBuilder.cpp:3334
#14 0x08c6ff30 in mozilla::layers::BasicShadowableThebesLayer::PaintBuffer 
(this=0x31160400, aContext=<value temporarily unavailable, due to 
optimizations>, aRegionToDraw=@0xefffae3c, aExtendedRegionToDraw=@0xefffadd8, 
aRegionToInvalidate=@0xefffae6c, aDidSelfCopy=false, aCallback=<value 
temporarily unavailable, due to optimizations>, aCallbackData=0xefffc928) at 
basic/BasicThebesLayer.h:95
(More stack frames follow...)

I suspect we should turn MALLOC_FILL off, or only enable it for debugging.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 12:41

GoogleCodeExporter commented 9 years ago

Disabling MALLOC_FILL fixes that problem.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 2:28

GoogleCodeExporter commented 9 years ago

Dragging out of the browser does work for text, just not images.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 2:39

GoogleCodeExporter commented 9 years ago

After playing around in jemalloc for awhile it appears that there is no good 
way to work around the image drag bug, so it will have to be wallpapered.

In content/base/src/nsContentAreaDragDrop.cpp, disabling creating the native 
image draggable still crashed.

Disabling creating the native image draggable *and* disabling creating the file 
promise also still crashed.

So, I stopped those from even being made by just returning NS_OK when a drag 
gesture is detected over an image. Now it no longer crashes, but other drags 
like URLs and so on still work.

Nicely still, on all the images in M702250 copying them to the clipboard still 
worked, as well as saving them to disk, just not dragging them. I consider this 
a shippable compromise.

Original comment by classi...@floodgap.com on 18 Jul 2013 at 10:54

Changed state: Started

GoogleCodeExporter commented 9 years ago

New crash in jemalloc: the Applications tab in Preferences crashes the browser 
with a null pointer. I need to build a DEBUG for a better stack trace, but 
here's the first part of it in an opt build. malloc* in the backtrace is scary.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x000254e4 in receive_samples ()
(gdb) bt
#0  0x000254e4 in receive_samples ()
#1  0x90050bbc in malloc_zone_from_ptr ()
#2  0x90050bbc in malloc_zone_from_ptr ()
#3  0x937f4310 in -[NSBitmapImageRep _freeImage] ()
#4  0x937f3f50 in -[NSBitmapImageRep dealloc] ()
#5  0x937f3c40 in -[NSImage _freeRepresentation:] ()
#6  0x937f3a68 in -[NSImage dealloc] ()
#7  0x92bd2c68 in NSPopAutoreleasePool ()

Original comment by classi...@floodgap.com on 20 Jul 2013 at 1:38

GoogleCodeExporter commented 9 years ago

Seeing that it involves discarding images, maybe it has to do with the icons. 
We can dispense with those if we have to, but we can't disable the Applications 
tab functionality entirely.

Original comment by classi...@floodgap.com on 20 Jul 2013 at 1:39

GoogleCodeExporter commented 9 years ago

It's actually part of a general problem with icons -- if you pull up Downloads 
and start scrolling, it will crash as icons get freed. It's the same underlying 
bug as M702250 again, but it's just really bad and consistent on 10.4.

The current wallpaper is to make nsIconChannelCocoa.mm return NS_ERROR_FAILURE 
when queried for an icon; icons appear too frequently to force them to leak, or 
memory usage gets bad. This is okay for a beta, it's not okay for release. 
However, the way around that is to make a new icon service that hands prefab 
icons over from chrome (by just redirecting through ioService). That's pretty 
easy to write, just a little tedious, and we'd have to add a decent library of 
icons at 16x16 and 32x32 sizes.

Original comment by classi...@floodgap.com on 20 Jul 2013 at 3:53

GoogleCodeExporter commented 9 years ago

(by icons I mean icons from OS X -- icons internal to TenFourFox, such as 
favicons and chrome, work fine)

Original comment by classi...@floodgap.com on 20 Jul 2013 at 3:55

GoogleCodeExporter commented 9 years ago

New showstopper: uploads work inconsistently. I'm able to upload to ADN, but 
not to Google Code. Google Code works fine in 22.0.

I'm getting concerned there are all kinds of bugs in here like that.

Original comment by classi...@floodgap.com on 21 Jul 2013 at 3:12

GoogleCodeExporter commented 9 years ago

I can't reproduce this reliably anymore. I give up. Let's see what the punters 
think.

Original comment by classi...@floodgap.com on 21 Jul 2013 at 6:12

GoogleCodeExporter commented 9 years ago

Viewing isolated images is slow to close. I bet the background texture is the 
issue. Never liked it anyway, so we could jettison that.

Original comment by classi...@floodgap.com on 21 Jul 2013 at 8:58

Changed title: Enable jemalloc on 10.4, fix problems

GoogleCodeExporter commented 9 years ago

We've got a serious leak somewhere. It manifests in the base browser too after 
a couple days of uptime, but is nowhere near as severe, and the browser grinds 
to a halt in GC. The base non-jemalloc build does this slightly, but it still 
runs fine. This unfortunately *is* a show stopper.

Original comment by classi...@floodgap.com on 26 Jul 2013 at 4:05

GoogleCodeExporter commented 9 years ago

Testing 24 shows this no longer pays off, and the browser's memory use still 
skyrockets. Dropping priority.

Original comment by classi...@floodgap.com on 29 Oct 2013 at 4:57

Added labels: Priority-Low
Removed labels: Priority-High

internetzel commented 8 years ago

Just found out something quite important about the virtual memory management function madvise that is used in memory allocators like jemalloc to indicate the system whether memory can safely be paged out or not. Since 10.3 there has been the option MADV_FREE defined in the mman.h system header which supposedly indicates the kernel that a virtual memory region is not in use any longer and hence can be paged out immediately - but it wasn't implemented in the kernel until 10.6 (see http://fxr.watson.org/fxr/trackident?i=MADV_FREE in order to verify the use of MADV_FREE in different xnu versions). Until 10.5 this option does simply nothing but return an error plus the waste of time of performing a syscall - as a consequence the kernel will have to trust it's own statistics about what memory to page out first, and building up meaningful statistics takes time - and can definitley cause a long running browser instance to crawl (see https://bugzilla.mozilla.org/show_bug.cgi?id=736074). In order to help the kernel to quickly know what to page out first there's really only MADV_DONTNEED implemented up to including 10.5 . I found this because in WebKit 601 TCMalloc was replaced by bmalloc (needing a C++11 compiler and runtime) and the browser started to hang in madvise calls quite soon after launch - I don't know whether using MADV_DONTNEED instead of MADV_FREE is the solution but at least the syscall does succeed now (feels much better that way). Since the JavaScriptCore garbage collector now bases on bmalloc as well that thing has to work really well or whole WebKit becomes pretty much unusable.

classilla commented 8 years ago

Interesting. Something to consider for the future if I resurrect this again.

internetzel commented 8 years ago

While the unimplemented MADV_FREE is interesting I found that the cause for bmalloc spending much time in madvise() was in fact a non working custom mutex built around std::atomic_flag from libstdc++. I'm pretty sure that rather simple mutex class is properly implemented, so maybe std::atomic_flag is not working correctly (the mutex is needed because freeing memory respectively marking memory as MADV_DONTNEED, for example for JavaScript garbage collection, happens in a separate thread). By disabling bmalloc and using system malloc instead, the hangs are now gone. And as WebKit uses memalign() I implemented a wrapper around malloc() that returns aligned adresses (wasting space of course). Since WebKit uses a seperate function for freeing aligned memory there's no problem in storing the original adress returned by malloc() in the extra space allocated for being able to do the alignment, so it shouldn't be leak memory. But that non-working mutex seems alarming to me.

internetzel commented 8 years ago

Actually neither MADV_DONTNEED nor MADV_WILLNEED are implemented; they do the same thing as MADV_NORMAL which is the default. So calling madvise is in fact just just a waste of time on OS X version earlier than 10.6 .

classilla commented 8 years ago

For 45 performance is sufficiently satisfactory that any gains from this would be outweighed by the stuff I know would break, so wontfix.

classilla / tenfourfox

Enable jemalloc on 10.4, fix problems #218