github / cmark-gfm

GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C
Other
875 stars 171 forks source link

Emscripten asm.js build of cmark-gfm overflows Javascript stack and can't be loaded in web browser #365

Open ptc-shunt opened 2 months ago

ptc-shunt commented 2 months ago

When building cmark-gfm to asm.js with Emscripten I ran into the issue that the cmark library suffers a Javascript stack overflow during loading. It happens in both debug and release builds.

To reproduce:

CMakeLists.txt build_cmarkgfm_js.bat.txt

It appears this error is emitted while loading the cmark-gfm library code, i.e. before any execution has happened.

I tracked it down to the very large switch statement in case_fold_switch.inc. If you look at api_test.js (debug build, for readability) in an editor and search for cmark_utf8proc_case_fold and then scroll down you will see that Emscripten has generated a very deeply nested set of {} blocks - over 1400 levels deep. This evidently exceeds the browser's Javascript stack capacity,

function cmark_utf8proc_case_fold($0, $1, $2) {
  $0 = $0 | 0;
  $1 = $1 | 0;
  $2 = $2 | 0;
  var $5 = 0, $22 = 0, wasm2js_i32$0 = 0, wasm2js_i32$1 = 0;
  $5 = __stack_pointer - 32 | 0;
  __stack_pointer = $5;
  HEAP32[($5 + 28 | 0) >> 2] = $0;
  HEAP32[($5 + 24 | 0) >> 2] = $1;
  HEAP32[($5 + 20 | 0) >> 2] = $2;
  label$1 : {
   label$2 : while (1) {
    if (!((HEAP32[($5 + 20 | 0) >> 2] | 0 | 0) > (0 | 0) & 1 | 0)) {
     break label$1
    }
    (wasm2js_i32$0 = $5, wasm2js_i32$1 = cmark_utf8proc_iterate(HEAP32[($5 + 24 | 0) >> 2] | 0 | 0, HEAP32[($5 + 20 | 0) >> 2] | 0 | 0, $5 + 16 | 0 | 0) | 0), HEAP32[(wasm2js_i32$0 + 12 | 0) >> 2] = wasm2js_i32$1;
    label$3 : {
     label$4 : {
      if (!((HEAP32[($5 + 12 | 0) >> 2] | 0 | 0) >= (0 | 0) & 1 | 0)) {
       break label$4
      }
      $22 = HEAP32[($5 + 16 | 0) >> 2] | 0;
      label$5 : {
       label$6 : {
        label$7 : {
         label$8 : {
          label$9 : {
           label$10 : {
            label$11 : {
             label$12 : {
              label$13 : {
               label$14 : {
                label$15 : {
                 label$16 : {
                  label$17 : {
                   label$18 : {
                    label$19 : {     -----> goes on for many, many levels up to label$1407

Evidently this is an emscripten code gen issue which the huge switch statement provokes. Tried changing the compiler optimize setting (including -Os) but it didn't help.

I was able to work around it by rearranging the code in utf8.c and case_fold_switch.inc to use if's instead of a switch. The code generated from that is much less deeply nested and loads & runs correctly (console reports all tests passed).

Another alternative might be some sort of lookup table.