blueboxd / chromium-legacy

Latest Chromium (≒Chrome Canary/Stable) for Mac OS X 10.7+
BSD 3-Clause "New" or "Revised" License
302 stars 17 forks source link

Crashing on launch on OS X 10.9.5 #2

Closed Wowfunhappy closed 3 years ago

Wowfunhappy commented 4 years ago

First of all: Thank you so much for maintaining this!

Unfortunately, I cannot seem to get this to launch on OS X 10.9.5. It opens, bounces in the Dock for a few seconds, then crashes due to EXC_BAD_ACCESS. A full crash report is attached.

Note that—as you may notice in the log—I am running inside of a VM. While my host machine is also running OS X 10.9.5, as a matter of course I always test out new software in a VM before moving it to my main machine. VMWare Fusion is very good, and it has never caused problems in the past (outside of highly 3D-intensive software like games), but if you think this is the source of the crash, please let me know and I'll try on the real thing.

Thanks again!

Chromium_2020-09-16-085032_Jonathans-Mac.crash.zip

aeiouaeiouaeiouaeiouaeiouaeiou commented 4 years ago

Keeps crashing on 809500 build.

blueboxd commented 4 years ago

unfortunately, these chromium builds are not working on 10.9 for now. It seems caused by the skia library and 10.9's system font framework, but the details of the cause are unknown. I'll keep investigating, but I'm not sure if it can be resolved. sorry.

Wowfunhappy commented 3 years ago

I don't really know how any of this code works, so this could be completely wrong. But I was looking at the set of changes you made to Skia for Lion and Mountain Lion.

It looks like you added some code which will specifically only run on Lion and Mountain Lion machines. https://github.com/blueboxd/skia/blob/6fdecbea70f1be7800a70c06734bdf54d41fec4b/src/ports/SkScalerContext_mac_ct.cpp#L423

(N.B. The isLion() / isMountainLion() functions are defined up here: https://github.com/blueboxd/skia/blob/6fdecbea70f1be7800a70c06734bdf54d41fec4b/src/ports/SkScalerContext_mac_ct.cpp#L112)

According to comments, this was done to work around a bug in CTFontGetBoundingRectsForGlyphs, which on 10.7/8 return a bad value for fonts "whose hhea::numberOfHMetrics is less than its maxp::numGlyphs."

If the bug is also present on Mavericks, perhaps this workaround just needs to be enabled there too! (Or, if this comes from old Google code, perhaps there's a similar Mavericks code path.) Notably, these seem to more-or-less align with the crash log, which is crashing on GetNumberOfGlyphs.

I would experiment myself if I could get Chromium to compile...

blueboxd commented 3 years ago

Thank you very much for your cooperation for the analysis.

Yes, I ported the patch for Lion/Mountain Lion from the old Skia library, but the problem resolved by that patch is not a crashing issue. (Chromium still launching on 10.7/10.8 even without a patch...) And 10.9 doesn't have that bug.

I looked into Skia sources for the latest version for 10.9(m65), but the code path to the crashed function and the parameter seems almost the same. And I'm tracing debug build of the latest Chromium on 10.9, but still no hint is appeared for now. Skia is seemed to call CoreText (CTFontDescriptorCopyAttribute or so) as normal, maybe same as version of m65. So, the problem is the caller of skia library function... or wrong initialization?

I'll continue to investigate, so please wait for a while. Sorry for the inconvenience.

Compiling patched Chromium for non-supported OSes is a bit tricky (need extra patches for compilation not committed to this repository), and Chromium compilation is super CPU-time consuming work (~40min with 14-core Xeon and two Ryzen 3950x machines), so I don't recommend to try...

Wowfunhappy commented 3 years ago

Oh don't apologize, I'm just delighted you're maintaining this! Sorry that investigation wasn't helpful.

Wowfunhappy commented 3 years ago

Ha! It works!

Screen Shot 2021-01-24 at 10 41 16 PM

Since I couldn't make Chromium compile, I decided to screw around with some DYLD_INTERPOSE magic:

#import <Cocoa/Cocoa.h>

#define DYLD_INTERPOSE(_replacement,_replacee) \
__attribute__((used)) static struct{ const void* replacement; const void* replacee; } _interpose_##_replacee \
__attribute__ ((section ("__DATA,__interpose"))) = { (const void*)(unsigned long)&_replacement, (const void*)(unsigned long)&_replacee };

void myCTFontDrawGlyphs (CTFontRef font, const CGGlyph glyphs[], const CGPoint positions[], size_t count, CGContextRef context) {
    if (glyphs[1] != 0) {
        CTFontDrawGlyphs(font, glyphs, positions, count, context);
    }
}

DYLD_INTERPOSE(myCTFontDrawGlyphs, CTFontDrawGlyphs);

If you compile this into a library called ChromiumFixer.dylib, and launch Chrome via:

DYLD_INSERT_LIBRARIES=/path/to/ChromiumFixer.dylib /path/to/Chromium.app/Contents/MacOS/Chromium

It will start up on Mavericks... although trying to actually load a site will cause the tab to crash after a few seconds. To fix that, I had to disable remote fonts:

DYLD_INSERT_LIBRARIES=/path/to/ChromiumFixer.dylib /path/to/Chromium.app/Contents/MacOS/Chromium  --disable-remote-fonts

And this time you'll be able to actually browse the web, albeit without web fonts, which kind of sucks.

Anyway—all my code does is prevent CTFontDrawGlyphs from being called if glyphs[1] equals zero. I don't know why this works, I just noticed it was the circumstance under which Chromium crashed, and that the function otherwise worked fine.

Coincidentally, this only ever happens the first two times that function is called. So, the below version of the interposed function also works:

int i = 0;
void myCTFontDrawGlyphs (CTFontRef font, const CGGlyph glyphs[], const CGPoint positions[], size_t count, CGContextRef context) {
    i++;
    if (i > 2) {
        CTFontDrawGlyphs(font, glyphs, positions, count, context);
    }
}

As far as I can tell, remote fonts never call CTFontDrawGlyphs, (and I couldn't find a different function that was easy to intercept), which is why I had to disable them.

Anyway, I hope I actually discovered something useful this time! I think this is as far as I can reasonably go without modifying Chromium's code, but hopefully you can look into what's actually calling CTFontDrawGlyphs and fix the issue a tad further up the chain.

blueboxd commented 3 years ago

Thank you so much for your testing! And yes, that instruction skips CTFontDrawGlyphs calls on SkCTFontGetSmoothBehavior. Then, Chromium starts, but the render process crashes due to other font related API calls on most pages. I need further investigation to resolve.

Wowfunhappy commented 3 years ago

Hey, uh, so, I got it to work fully on Mavericks, remote fonts and all! Just don't get too excited until you see the code. 😉

#include <Foundation/Foundation.h>

#define DYLD_INTERPOSE(_replacement,_replacee) \
__attribute__((used)) static struct{ const void* replacement; const void* replacee; } _interpose_##_replacee \
__attribute__ ((section ("__DATA,__interpose"))) = { (const void*)(unsigned long)&_replacement, (const void*)(unsigned long)&_replacee };

void myCFRelease (CFTypeRef cf) {}
DYLD_INTERPOSE(myCFRelease, CFRelease);

Obviously, this is going to create a memory leak! But it indicates that the problem is some sort of use-after-free error. It might even be happening in Lion too, unbeknownst to us, if Apple did something in Mavericks to increase memory safety.

Maybe open that debug build you have in xCode Instruments and look for Zombies?

(Also, I have to say... memory usage in practice seems to be totally normal, even stress testing with memory-hungry web apps like Slack. I can't say I understand how that's possible...)

blueboxd commented 3 years ago

Wow! again! Thank you very much for the great hint!

OK, this means fonts were released before use, but why only on 10.9...? I'll trace around CFRetain/CFRelease in skia library. (and I uploaded debug version of latest build in case you want to trace with debug symbols)

Memory consumption seems not so bad for light use, but rendering is very flickering on my environment...Is this build useful on 10.9?

Wowfunhappy commented 3 years ago

You're very welcome. I spent all weekend interposing random crap until I finally found that, this issue was just bothering me so much!

rendering is very flickering on my environment...Is this build useful on 10.9?

I'm guessing you're in a VM, right? Launch Chromium with --disable-gpu-compositing (or disable hardware acceleration in Chrome's Settings) and the flickering will go away! Mavericks appears to be new enough that Chromium tries to turn on hardware acceleration—check out chrome://gpu!

On a real 10.9 Mac, Chromium definitely seems to be more sluggish than Firefox 78 on the same machine, but yes, it's totally usable. (And I'm expecting it to become the only way to safely view a lot of websites when Mozilla fully drops 10.9 support later this year.)

blueboxd commented 3 years ago

Your weekend is definitely invaluable...thanks a lot!

Mavericks appears to be new enough that Chromium turns on hardware acceleration

Ah, gotcha! Sorry, I had been too accustomed to the old-fashioned environment...

yes, it's totally usable.

I got it, thank you. I'll try to make Chromium usable on 10.9 anyway!

Wowfunhappy commented 3 years ago

Just one more possible hint, this time courtesy Xcode Instruments. As noted, they come from two different builds—I happened to have the older one on my hard drive, and I thought it was possibly notable that the Zombie was from a different method. Notably, xCode instruments did not display this when I tried to trace the debug build, for whatever reason.

Chromium Build 848870: From 848870

Chromium Build 846582: From 846582

Wowfunhappy commented 3 years ago

Hey there! A quick update, after a week of using Chromium Legacy on Mavericks as my primary browser.

  1. A small number of websites were still consistently causing Chromium to crash—https://eclecticlight.co is one example. Because CTFontCreatePathForGlyph was in the backtrace, and because Apple's Docs said NULL was a valid return value for CTFontCreatePathForGlyph, I just interposed the function to always return NULL. Boom, crashes gone, with no noticeable side effects. 😁

  2. A bunch of garbage was appearing when resizing the window in Mavericks. Setting [self setWantsLayer:true] to true in NSView made it go away.

  3. I modified my CFRelease hack to be significantly less terrible. Now, whenever CTFontManagerCreateFontDescriptorFromData is called, a counter is set to prevent the next 150 CFRelease's from occurring. Once the counter reaches zero, calls to CFRelease are once again allowed through.

This is obviously still a messy hack, but it made a variety of bizarre glitches disappear, and it made me feel less dirty 🙂. Pages do still crash on rare occasions (presumably when the fatal call falls outside my 150-call window), but reloading a couple of times always seems to clear it up, and the problem is infrequent enough that I haven't found it annoying.

  1. I recommend hardcoding the --disable-gpu-compositing flag. Although GPU acceleration technically works on Mavericks, visual glitches keep popping up, and it actually seems to make browsing slightly slower. A nice thing about this flag is that it leaves acceleration on for e.g. WebGL.

Here is the code I'm currently injecting, which contains first three fixes described above. I hope it's helpful, and thanks again—I'm really enjoying this browser!

https://gist.github.com/Wowfunhappy/0d5286e0d87f0bf79ceff0930dba5661

blueboxd commented 3 years ago

Thank you again and again...! And sorry for the slow progress of my investigation.

First, sorry, I forgot to upload dSYMs for debug build. The current release has dSYMs to display symbols and location of sources (separated archive because of enormous size!). You may need a recent lldb (confirmed with lldb included in llvm 7.0, not one included with Xcode 6) to trace with. Additionally, if you want to trace with source, map /src/ to /path/to/cloned/src/ with lldb command settings set -- target.source-map /src/ /path/to/cloned/src/.

As for now, CFRelease seems problematic after creating font from on-memory data. So, crashing with remote font, not with local font. CFRetain-ing before using font does not resolve this issue...now I'm tracing around SkUniqueCFRef. (SkCTFontGetSmoothBehavior and SkCTFontGetDataFontWeightMapping are very first function making fonts by that way)

A small number of websites were still consistently causing Chromium to crash—https://eclecticlight.co is one example.

Not crashing on my environment (VM and MacBookPro10,1), are there any other examples? (or need particular operation?)

Please wait a while for investigation and implementing your suggestion.

Wowfunhappy commented 3 years ago

Not crashing on my environment (VM and MacBookPro10,1), are there any other examples?

That's interesting. Another notable one for me was chase.com, albeit only at desktop widths. There were others too but I wasn't keeping track, perhaps I'll need to de-activate that fix and collect more...

blueboxd commented 3 years ago

Another notable one for me was chase.com

Thank you, confirmed. I'll investigate this issue too.

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x000000018313df4a

Thread 0 Crashed:: CrRendererMain  Dispatch queue: com.apple.main-thread
0   com.apple.CoreText              0x00007fff8de33a52 TFont::FindColourBitmapForGlyph(double, unsigned short, __CFData const*, unsigned long&, unsigned long&, unsigned char&, double&) const + 314
1   com.apple.CoreText              0x00007fff8de33761 TFont::CreatePathForGlyph(unsigned short, CGAffineTransform const*) const + 105
2   com.apple.CoreText              0x00007fff8de2f59d CTFontCreatePathForGlyph + 107
3   org.chromium.Chromium.framework 0x00000001128da853 SkScalerContext_Mac::generateMetrics(SkGlyph*) + 691
4   org.chromium.Chromium.framework 0x000000010eb959b3 SkScalerContext::internalMakeGlyph(SkPackedGlyphID, SkMask::Format) + 99
Wowfunhappy commented 3 years ago

I found something very suspicious!

From Skia's changelog for milestone 82:

Remove CGFontRef parameter from SkCreateTypefaceFromCTFont. Use CTFontManagerCreateFontDescriptorFromData instead of CGFontCreateWithDataProvider to create CTFonts to avoid memory use issues.

We are, in fact, having memory use issues when Skia creates a typeface!

I believe this is the specific commit being referenced: https://github.com/google/skia/commit/c9b06ca70890ba23ab3f54fb69e4db7fbac79393

Since the Blink side CTFontRefs are no longer created from CGFontRefs after [1], we do not need to pass CGFontRefs to Skia any longer for keeping them alive.

I may not know exactly what this means, but I'm feeling pretty good about this being the real source of the problem!

Edit: Some history behind the change: https://bugs.chromium.org/p/skia/issues/detail?id=4043.

blueboxd commented 3 years ago

Wow! That's almost an answer! Old builds I hadn't tested on 10.9 turned out working! I'll take diffs between working skia (before #719594) and not working one (after #720194).

blueboxd commented 3 years ago

Gotcha! This should be the culprit:imp: (but why only 10.9 crashes in this way...?)

I'll forward-port this diff to current skia. Please wait for interrogation:monocle_face:

blueboxd commented 3 years ago

Hallelujah! The latest build worked without crashing!! Thus, CTFontManagerCreateFontDescriptorFromData cannot be used on 10.9...? CTFontCreatePathForGlyph is still crashing on VM, I'll keep investigating.

Also, --disable-gpu-compositing is now hardcoded.

Wowfunhappy commented 3 years ago

Oh my god, at last! Thank you so much!

CTFontCreatePathForGlyph is still crashing on VM, I'll keep investigating.

I was looking into this a bit yesterday, take a look at this: https://github.com/blueboxd/skia/blob/c64e8a9f0406f35b642dd8ab8f395506e60c6c51/src/ports/SkScalerContext_mac_ct.cpp#L452

Theory time: What if the pages that crash are also the ones which contain this "zero-advance space" glyph? When Chromium sees that glyph, it runs this extra code block, to work around a bug which for all we know isn't even present on old versions of macOS. That would explain why interposing CTFontCreatePathForGlyph to return NULL makes the problem go away—the function is programmed to exit early if (!path).

blueboxd commented 3 years ago

No, no, everything is thanks to you!

I was looking into this a bit yesterday, take a look at this:

Thank you for the pointer. In the latest build, experimentally avoids calling CTFontCreatePathForGlyph from SkScalerContext_Mac::generateMetrics on 10.9. chase.com seems not crashing on my environment. If still crashing, please open a new issue.

Wowfunhappy commented 3 years ago

Sure thing, I'll also go ahead and open a separate issue for the window resize garbage!