HaxeFoundation / hxcpp

Runtime files for c++ backend for haxe
Other
299 stars 191 forks source link

gc fails in emscripten webassembly #987

Open sibest opened 2 years ago

sibest commented 2 years ago

Hi @hughsando

Tha GC fails when compiled with O1 or O2 in emscripten platform with webassembly. Works solid in O0.

Any hint on clang flags to add to disable some opt flag which could cause it to fail?

Thank you much

sibest commented 2 years ago

The option that causes the GC to fail is -mem2reg on the llvm opt backend.

hughsando commented 2 years ago

Yes, I have seen the code store stack variables in named JS variables - like and infinite number of registers. Hxcpp will not be able to "see" these. I seem to recall there was some way to flush these to the stack when making an eternal function call. Can you disable the mem2reg flag only? Personally, I moved away from haxe->hxcpp->js and prefer haxe->js for its simplicity and compile time. There were some issues when mixing with native c++, but I managed to work around this in nme.

sibest commented 2 years ago

Actually i'm using the hxcpp->emscripten->wasm pipeline I had to disable some features in the GC that were designed for asm.js and not webassembly (for example HXCPP_STACK_UP) I've tried haxe->js but it's at least 70% slower than the webassembly build.

Yes i've managed to disable mem2reg by changing the backend of emscripten (in emcc.py), i had also to disable the -sroa optimization. Other than that, it works very well.

sibest commented 2 years ago

I managed to find all the optimization passes which can cause the GC to fail: -mem2reg (instant fail) -sroa (instant fail) -gvn (instant fail) -dse (delayed fail)

A safe list of optimizations arguments is: '-O0','-targetlibinfo','-tti','-tbaa','-scoped-noalias-aa','-assumption-cache-tracker','-profile-summary-info','-forceattrs','-inferattrs','-domtree','-callsite-splitting','-lower-expect','-ipsccp','-called-value-propagation','-globalopt','-domtree','-deadargelim','-domtree','-basic-aa','-aa','-loops','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-simplifycfg','-basiccg','-globals-aa','-prune-eh','-inline','-openmpopt','-lower-expect','-function-attrs','-argpromotion','-domtree','-basic-aa','-aa','-simplifycfg','-verify','-memoryssa','-early-cse-memssa','-speculative-execution','-aa','-lazy-value-info','-jump-threading','-correlated-propagation','-simplifycfg','-domtree','-aggressive-instcombine','-basic-aa','-aa','-loops','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-libcalls-shrinkwrap','-loops','-postdomtree','-branch-prob','-block-freq','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-pgo-memop-opt','-basic-aa','-aa','-loops','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-tailcallelim','-simplifycfg','-reassociate','-domtree','-loops','-loop-simplify','-lcssa-verification','-lcssa','-basic-aa','-aa','-scalar-evolution','-loop-rotate','-memoryssa','-licm','-loop-unswitch','-simplifycfg','-domtree','-basic-aa','-aa','-loops','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-loop-simplify','-lcssa-verification','-lcssa','-scalar-evolution','-indvars','-loop-idiom','-loop-deletion','-loop-unroll','-mldst-motion','-phi-values','-aa','-memdep','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-phi-values','-basic-aa','-aa','-memdep','-memcpyopt','-sccp','-demanded-bits','-bdce','-aa','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-lazy-value-info','-jump-threading','-correlated-propagation','-basic-aa','-aa','-phi-values','-memdep','-aa','-memoryssa','-loops','-loop-simplify','-lcssa-verification','-lcssa','-scalar-evolution','-licm','-postdomtree','-adce','-simplifycfg','-domtree','-basic-aa','-aa','-loops','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-barrier','-elim-avail-extern','-basiccg','-rpo-function-attrs','-globalopt','-globaldce','-basiccg','-globals-aa','-domtree','-float2int','-lower-constant-intrinsics','-domtree','-loops','-loop-simplify','-lcssa-verification','-lcssa','-basic-aa','-aa','-scalar-evolution','-loop-rotate','-loop-accesses','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-loop-distribute','-postdomtree','-branch-prob','-block-freq','-scalar-evolution','-basic-aa','-aa','-loop-accesses','-demanded-bits','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-inject-tli-mappings','-loop-vectorize','-loop-simplify','-scalar-evolution','-aa','-loop-accesses','-lazy-branch-prob','-lazy-block-freq','-loop-load-elim','-basic-aa','-aa','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-simplifycfg','-domtree','-loops','-scalar-evolution','-basic-aa','-aa','-demanded-bits','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-inject-tli-mappings','-slp-vectorizer','-vector-combine','-opt-remark-emitter','-instcombine','-loop-simplify','-lcssa-verification','-lcssa','-scalar-evolution','-loop-unroll','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instcombine','-memoryssa','-loop-simplify','-lcssa-verification','-lcssa','-scalar-evolution','-licm','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-transform-warning','-alignment-from-assumptions','-strip-dead-prototypes','-globaldce','-constmerge','-cg-profile','-domtree','-loops','-postdomtree','-branch-prob','-block-freq','-loop-simplify','-lcssa-verification','-lcssa','-basic-aa','-aa','-scalar-evolution','-block-freq','-loop-sink','-lazy-branch-prob','-lazy-block-freq','-opt-remark-emitter','-instsimplify','-div-rem-pairs','-simplifycfg','-verify'

sibest commented 2 years ago

-licm also causes the GC to fail

jgranick commented 1 year ago

Hi everyone,

I am visiting WebAssembly support now in Lime.

I too was running into issues with the HXCPP garbage collection cycle causing either a freeze or a printed error in WebAssembly. I am unsure if the above guidance is still valid today, though I've left things at -O0 for now and things seem relatively stable (though I get an error in Bunnymark regarding a stack overflow, I think when the GC process is run).

So far, the tooling for compiling to WebAssembly, the process of debugging and the runtime performance are much better than I expected. Working with asm.js was filled with promise, though in practice it was good for zlib and poor for a full application compared to plain JS. WebAssembly might actually be close to production ready for a larger project.

I've renamed the "emscripten" target to "webassembly" in the development version of Lime, and there's a chance there could be a renewed interest in the platform soon.

Dev versions of Lime require the following:

lime config EMSCRIPTEN_SDK path/to/emsdk/upstream/emscripten
lime rebuild tools
lime rebuild wasm
// then
lime test wasm

(wasm, webassembly and emscripten are all configured as synonyms for the target name)

SuperDisk commented 2 months ago

I'm running into this exact problem although it's with ECL rather than Haxe. Curious if there's a better solution than just using -O0 which is terrible for performance.

SuperDisk commented 2 months ago

Ah, looks like bdwgc actually has a few tips (added 3 weeks ago): https://github.com/ivmai/bdwgc/blob/c3c44d85081044707eb8fa4f0db39e1fe6834853/docs/platforms/README.emscripten#L9

-sBINARYEN_EXTRA_PASSES='--spill-pointers' worked for me, although it does mention it degrades performance.

hughsando commented 2 months ago

The spill-pointers looks good, and -O2 seems to work. A very nice addition.