dbsoft / UXP

Unified XUL Platform for White Star
Other
0 stars 1 forks source link

Fix instability of White Star when built with optimizations using newer versions of clang #1

Closed dbsoft closed 2 years ago

dbsoft commented 2 years ago

I have been looking into this issue on the path to supporting updated builds for Apple Silicon and just a more modern build system for supporting newer versions of MacOS more completely.

The crash is happening in some RootingAPI.h code that was removed in the following bugzilla issue and commit:

https://bugzilla.mozilla.org/show_bug.cgi?id=1325050 https://hg.mozilla.org/mozilla-central/rev/d2758f635f72f779f712bf9c6e838868ed53c9f7

This is just a massive commit and hits large portions of the code base, but I think it is important to the long term viability of the browser and builds. I will be committing this change in the newclang branch that I have already been working on.

https://github.com/dbsoft/UXP/tree/newclang

If this fixes the issue, I will look at merging this into the master branch.

dbsoft commented 2 years ago

Using this comment as a list of ways my patch diverges from the Mozilla one:

js/src/builtin/DataViewObject.cpp: File does not exist in White Star/Pale Moon, patched js/src/vm/TypedArrayObject.cpp Since the following patch was not integrated into Pale Moon. https://hg.mozilla.org/integration/mozilla-inbound/rev/d2ce4dbbc26d Move DataViewObject to its own file.

js/src/gc/Allocator.cpp: Removed an unexpected "AutoKeepAtoms keepAtoms(cx->perThreadData);" Will need to verify this was not needed by some other change.

js/src/gc/AtomMarking.cpp: File does not exist in White Star/Pale Moon. Since the following patch was not integrated into Pale Moon. https://hg.mozilla.org/mozilla-central/rev/7311c06a7271 https://bugzilla.mozilla.org/show_bug.cgi?id=1324002

js/src/gc/GCInternals.h: AutoStopVerifyingBarriers class not in our version. Haven't been able to find where this code was introduced. Just leaving it out for now.

js/src/gc/GCRuntime.h: JS_GC_ZEAL changes haven't been merged, so minor differences.

js/src/gc/Nursery.cpp: Minor changes due ot missing JS_GC_ZEAL changes. printProfileHeader() differences.

js/src/gc/Statistics.cpp: printProfileHeader() differences.

js/src/gc/Verifier.cpp: None of the code requiring changes is present, not sure where it was added. Probably the zeal mode that we don't have or keepAtoms.

js/src/jit/BaselineCacheIR.[cpp|h]: Was renamed BaselineCacheIRCompiler.[cpp|h] https://hg.mozilla.org/mozilla-central/rev/308412053019c9379eb899378ef575e903d64cbe Most required code changes missing.

js/src/jit/CacheIR.cpp: Code to change is missing.

js/src/jit/SharedIC.[cpp|h]: BaselineEmitPostWriteBarrierSlot missing.

js/src/jsapi.cpp: JS::GetModuleResolveHook and JS::SetModuleResolveHook different.

js/src/jsgc.cpp: Missing zeal and atom marking.

js/src/jsgc.h: GCParallelTask differences.

js/src/jscript.cpp: Most of the changes not related to ExclusiveContext removal, already present.

js/src/jswatchpoint.cpp: Missing?

js/src/vm/Debugger.cpp: Code to "Iterate through all wasm instances to find ones that need to be updated." missing. JS_DefineDebuggerObject() code to change missing.

js/src/vm/GeckoProfiler.cpp: Is SPSProfiler.cpp in White Star/Pale Moon.

js/src/vm/HelperThreads.[cpp|h]: EnqueuePendingParseTasksAfterGC() code different. Also some code de-duplicated in the mozilla version. ScriptDecodeTask missing.

js/src/vm/Stack.cpp: savedPrevJitTop_ code different.

js/src/vm/Xdr.[cpp|h]: Most code changes already present or code to change is missing.

Needed to make extreme changes to the ModuleResolveHook code.

dbsoft commented 2 years ago

Will need to look into warnings in dom/ips/StructuredCloneData.h in part 2 or a later patch.

I tried to implement fixes, but it caused white star to fail to link.

5:05.96 In file included from /Users/x/objdir-whitestar-current/dist/include/ipc/IPCMessageUtils.h:15: 5:05.96 Warning: -Wdefaulted-function-deleted in /Users/nuke/objdir-whitestar-current/dist/include/mozilla/dom/ipc/StructuredCloneData.h: explicitly defaulted move assignment operator is implicitly deleted 5:05.96 /Users/x/objdir-whitestar-current/dist/include/mozilla/dom/ipc/StructuredCloneData.h:83:3: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted] 5:05.96 operator=(StructuredCloneData&& aOther) = default; 5:05.96 ^ 5:05.96 /Users/x/objdir-whitestar-current/dist/include/mozilla/dom/ipc/StructuredCloneData.h:61:29: note: move assignment operator of 'StructuredCloneData' is implicitly deleted because base class 'mozilla::dom::StructuredCloneHolder' has a deleted move assignment operator 5:05.96 class StructuredCloneData : public StructuredCloneHolder 5:05.96 ^ 5:05.96 /Users/x/objdir-whitestar-current/dist/include/mozilla/dom/StructuredCloneHolder.h:159:3: note: copy assignment operator is implicitly deleted because 'StructuredCloneHolder' has a user-declared move constructor 5:05.96 StructuredCloneHolder(StructuredCloneHolder&& aOther) = default; 5:05.96 ^ 5:06.72 2 warnings generated.

dbsoft commented 2 years ago

I've now switched to trying to figure out which optimization is causing the issue by enabling the optimizations directly.

According to https://developer.amd.com/wordpress/media/2017/04/AOCC-1.1-Clang%20-%20the%20C%20C++%20Compiler.pdf

The following optimizations are added on at the -O2 level. -vectorize-loops -vectorize-slp -itodcalls -itodcallsbyclone -inline -mldst-motion -gvn -elim-avail-extern -slpinstcombine -globaldce -constmerge -loop-sink The following optimizations are dropped at the -O2 level. -always-inline

https://gist.github.com/lolo32/fd8ce29b218ac2d93a9e

Shows that -fvectorize and -fslp-vectorize options get added at -O2 level where the crash starts occuring.

So I attempted a build with this in the .mozconfig:

ac_add_options --enable-optimize="-O1 -fvectorize -fslp-vectorize"

Which should build at -O1 level with the vectorization options from -O2 enabled. This was stable.

So I am assuming the crash is being caused by one of the remaining optimizations:

-itodcalls -itodcallsbyclone -inline -mldst-motion -gvn -elim-avail-extern -slpinstcombine -globaldce -constmerge -loop-sink

However the rest do not have options to enable individually when calling clang. They only have options for calling LLVM's opt.

https://llvm.org/docs/CommandGuide/opt.html https://llvm.org/docs/Passes.html

According to the earlier referenced PDF from the AMD site:

-mllvm Need to provide -mllvm so that the option can pass through the compiler front end and get applied on the optimizer where this optimization is implemented. For example: -mllvm -enable-strided-vectorization

So I attempted the following to enable additional optimizations from the list:

ac_add_options --enable-optimize="-O1 -fvectorize -fslp-vectorize -mllvm -gvn"

But I get the following error:

0:22.16 DEBUG: configure: error: These compiler flags for C are invalid: -O1 -fvectorize -fslp-vectorize -mllvm -gvn

Is the -mllvm option invalid for Apple's clang? If so how do I enable the optimizations individually using Apple's clang?

dbsoft commented 2 years ago

I've verified this is absolutely a clang/llvm issue, and not a Mac specific issue. I've just done a clang build on Ubuntu Linux and get the same exact crash.

Will try to use clang on Linux to determine the optimization that is causing the problem, and put a fix in for all platforms. Either by disabling the problematic optimization for clang or work around it in the source.

dbsoft commented 2 years ago

Fix has been discovered for the instability.

https://repo.palemoon.org/MoonchildProductions/UXP/issues/1891

dbsoft commented 2 years ago

This fix has been merged into the newly unified Pale Moon UXP.