emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.84k stars 3.31k forks source link

minify breaks code #502

Closed bootstraponline closed 12 years ago

bootstraponline commented 12 years ago

Works. emcc -O2 --minify 0 -s ASSERTIONS=1 sundown.bc -o sundown_o2_m0_a1.js

Fails. emcc -O2 -s ASSERTIONS=1 sundown.bc -o sundown_o2_a1.js

This test case reported on Gollum #415 causes a crash when run against a build with emcc -O2 sundown.bc -o sundown_o2.js.

Is there a way to debug why minify is creating invalid code?


Uncaught abort() at Error
    at Error (unknown source)
    at $ (http://bootstraponline.github.com/livepreview_o2/public/js/sundown/sundown_o2.js:1:77135)
    at http://bootstraponline.github.com/livepreview_o2/public/js/livepreview/livepreview.js:356:3
    at Object.execCb (http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:1620:33)
    at Object.check (http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:980:51)
    at Object.<anonymous> (http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:857:30)
    at http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:119:23
    at http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:1187:21
    at each (http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:56:21)
    at Object.emit (http://bootstraponline.github.com/livepreview_o2/public/js/requirejs/require.js:1186:17) sundown_o2.js:1
bootstraponline commented 12 years ago

If I remove whitespace from sundown_o2_m0.js with closure then there's no error. The errors are only when minify is not set to 0.

java -jar compiler.jar --compilation_level WHITESPACE_ONLY --js sundown_o2_m0.js --js_output_file sundown_o2_wo.js

https://github.com/bootstraponline/livepreview/commit/3090fc6fe8f6a01782c476a8b0e3e087b32ca7ac

kripken commented 12 years ago

All --minify does is tell closure to minify whitespace or not. The likely explanation for this problem is a bug in closure, it might minify whitespace in a way that generates invalid code, and the next processing step then fails.

Run with EMCC_DEBUG=1 and look at the closure step saved to /tmp/emscripten_temp. See if a JS engine is willing to parse that as valid.

Are you using the closure compiler bundled with emscripten, or another?

bootstraponline commented 12 years ago

I downloaded the latest closure compiler and that's what I'm invoking to run whitespace_only. Could the bundled emcc have an out of date closure with a bug?

Is there a way to tell emcc to use my closure instead of the bundled version? It doesn't produce any errors when running with the latest jar.

kripken commented 12 years ago

You can define CLOSURE_COMPILER in ~/.emscripten to have the path to a different closure.

bootstraponline commented 12 years ago

It seems emscripten is bundling an outdated closure jar. Is there any reason not to upgrade to the latest?

kripken commented 12 years ago

Nothing except for making sure that the new one passes all our test suite and does not regress benchmarks.

bootstraponline commented 12 years ago

I updated emscripten and now even emcc -O2 --minify 0 produces broken builds using the bundled closure. If I revert back to an old emscripten version, it works on most inputs and crashes on others. It seems there are a bunch of hard to debug errors in the translated code.

kripken commented 12 years ago

How are the builds now broken after you updated? Are they broken with --closure 0 too?

bootstraponline commented 12 years ago

There are fatal errors either on startup or once the text starts being entered depending on the options. Even builds that I thought were good break with certain input. The builds were always broken, I just didn't notice until I found better sample data.

Edit: Yes they are broken with --closure 0 also.

kripken commented 12 years ago

If you get unexpected behavior, best is to make a testcase that can be built natively and to js, and then debug the difference,

https://github.com/kripken/emscripten/wiki/Debugging

I can help debugging such a testcase too.

bootstraponline commented 12 years ago

The gcc version works properly on the same input, only the emscripten version fails. It was faster for me to rewrite the entire system in JavaScript instead of figuring out what the bugs are in emscripten.

kripken commented 12 years ago

I would still like to fix the bugs we have. Any chance you can point me to the relevant code and build instructions, and I'll debug the difference between gcc and emscripten on that code?

bootstraponline commented 12 years ago

The code is in the master branch of bootstraponline/sundown

The easy way to test the JavaScript version is to replace sundown_o2_c0.js in livepreview_o2 and then load index.html. I have passenger installed so passenger start works, however any server will do.

Let me know if there's anything I can do to help.

kripken commented 12 years ago

Thanks, I'll take a look now.

Btw, since you rewrote the whole thing by hand in JS, what performance are you seeing when you compare the two?

kripken commented 12 years ago

Ok, I forked your repo and added a commit to generate a test.js that is an exact parallel to the native test it generates, for easy comparison,

https://github.com/kripken/sundown/commit/354c62d7b4c7d503693c6247b365faecd8804118

Running those on data.md (which looks like the same as the testcase you mention from before as being broken in js - are there other inputs I should test too?), I get identical results in js and native, so I can't reproduce the problem at that level.

Can you perhaps try to run ./js and ./native in my fork and see if you get the same results? If not, there might be a configuration problem with emscripten. Or if the js output is still broken for you, then I can next look and see if there is a problem in the wrapper code (which test.js does not use).

bootstraponline commented 12 years ago

Btw, since you rewrote the whole thing by hand in JS, what performance are you seeing when you compare the two?

I rewrote the markdown parser integration which is just a few lines of code. The underlying engine was switched out for a modified markdowndeep. Based on the published benchmarks I expect the speed to be about the same. markdowndeep is almost half the size (62KB vs 103 KB) and doesn't require typed arrays. Even pagedown, one of the slowest parsers, is still fast enough to update in real time as text is entered.

The only problem with the emscripten version of Sundown is that users would report certain input crashes, I'd reproduce the crash, and then be unable to debug the issue. With a JS based parser it's easy for me to fix problems. The other parsers do not implement all of GitHub Flavored Markdown though which is why I initially tried to use emscripten.

I've pushed a collection of md files that lead to crashes on github.

I will try your fork and will let you know. Thanks for your help.

bootstraponline commented 12 years ago

Using node I get the same result which is encouraging. _str_to_html doesn't seem to be exported when included in LDFLAGS.

LDFLAGS=-g -Wall -Werror -s EXPORTED_FUNCTIONS="['_main', '_malloc', '_free', '_str_to_html', '_realloc']"

If I edit settings.js then it's detected.

var EXPORTED_FUNCTIONS = ['_main', '_malloc', '_free', '_str_to_html', '_realloc'];

The input is correct on the hosted page and I can't reproduce the crash. I only made small changes to the code so I'm not sure why it works now.

bootstraponline commented 12 years ago

I found one error: μ†ℱ ╋ℯ╳╋ turns into <p>μ†ℱ ╋ℯ╳╋</p> using ./native.sh however ./js.sh produces <p>ᅫᄐ¬タᅠ¬トᄆ ¬ユヒ¬トᆵ¬ユᄈ¬ユヒ</p>

kripken commented 12 years ago

LDFLAGS is not relevent for exporting. All -s commands, including -s EXPORTED_FUNCTIONS, affect compilation to js. So they matter when you convert the bitcode to js in the last step, not the previous step of linking.

About that non-ascii error, to be honest I am not sure anyone has tried non-ascii in emscripten ;) We might lack some unicode support, or perhaps node lacks it.' I'll investigate.

bootstraponline commented 12 years ago

Is there a way to pass EXPORTED_FUNCTIONS on the command line or in the Makefile instead of editing settings.js?

emcc: warning: treating -s as linker option and not as -s OPT=VALUE for js compilation

kripken commented 12 years ago

Yes, just add -s KEY=VALUE when calling emcc in the command that generates a .js file. If you get that warning in that command, show me the entire command you are running.

Regarding the non-ascii issue, I don't know enough UTF to fix it. But the raw characters in "μ†ℱ" from your example start with 206 188 226 128 in binary, but I don't know how to get from that to a JS string that will print out as the same. For example, "μ".charCodeAt(0) = 956 and I don't see how to relate 956 to the binary numbers from before. Maybe one is UTF-8 and the other UTF-16 or something like that.

bootstraponline commented 12 years ago

Thanks. That worked.

Markdowndeep handles UTF8 properly. I think the emscripten code is mangling encoding on the characters. Everything should be UTF8.

'╋'.charCodeAt(0)
9547

Turns into: ¬ユヒ<
'¬ユヒ'.charCodeAt(0)
65506
'¬ユヒ'.charCodeAt(1)
65429
'¬ユヒ'.charCodeAt(2)
65419
kripken commented 12 years ago

So, '╋\n' in a C string turns into 226 149 139 10. Is that UTF 8? Anyhow, this is what the JS will see. I have no immediate idea how to turn the first three of those numbers into 9547 which is what we need for JS to print ╋...

bootstraponline commented 12 years ago

Java reports the same value as JavaScript.

System.out.println((int) "╋".charAt(0)); // 9547

226 149 139 10 is UTF-8.

byte[] b = new byte[]{ (byte) 226, (byte) 149, (byte) 139, 10 };
System.out.println(new String(b)); // ╋

Here's the solution.

// javascript
Utf8.decodeArray([226,149,139,10]) //╋
kripken commented 12 years ago

Thanks!

Ok, with guidance from that I implemented this in the incoming branch and added a test.

bootstraponline commented 12 years ago

I checked out incoming and node test.js data.md > output.js.html now produces the correct result. In the webpage though it's still incorrect. The markdowndeep version works.

μ†ℱ ╋ℯ╳╋ becomes ¼ 1 K/sK

kripken commented 12 years ago

Did you override the standard output stream? If not, can you make a -O2 --closure 0 build (so I can debug it)?

bootstraponline commented 12 years ago

I didn't override standard out. I'm running ./js.sh in sundown2.

sundown.js built with -O2 --closure 0

--closure 0 doesn't seem to have disabled minification. Edit: never mind, it was loading from cache even though I have cache set to disabled.

kripken commented 12 years ago

To debug this I need a local copy I can modify. I ran ./js.sh in that repo, but I guess it doesn't have any of the other necessary files to recreate the web version that fails for you? Can you perhaps make a zip file with a complete build?

kripken commented 12 years ago

Or, maybe first link me to the glue code you use to convert the compiled code's output into a native JS string, I think I might know what we need to fix.

bootstraponline commented 12 years ago

./js.sh just produces sundown.js which is then inserted into livepreview_o2.

I think Pointer_stringify is messing it up. Here's a zip that can run locally.

This is the glue code from livepreview.js:

// text is: μ†ℱ ╋ℯ╳╋
writeStringToMemory( text, pointer );
text = Pointer_stringify( _str_to_html( pointer ) );
// text is now: <p>¼ 1 K/sK</p>
kripken commented 12 years ago

Thanks, yeah, that was what I just realized must be it. We need Pointer_stringify to be utf8-aware, not just stdout.

This should now be fixed on incoming.

bootstraponline commented 12 years ago

That's better, although not quite fixed.

젱 K/sK

kripken commented 12 years ago

The problem must be in the input - we give writeStringToMemory a JS UTF string, and need to convert it into a C UTF string - the opposite of what we do in Pointer_stringify. Now we just need the reverse formula for the previous one...

bootstraponline commented 12 years ago

Here's the solution.

var b = encode_utf8('╋');
b.charCodeAt(0);
226
b.charCodeAt(1);
149
b.charCodeAt(2);
139

I added the UTF-8 code and now the webpage version works correctly.

kripken commented 12 years ago

Thanks!

I also implemented this in incoming, so just using writeStringToMemory should work now too.

bootstraponline commented 12 years ago

Thanks! I'll test it out and will let you know how it works.

kripken commented 12 years ago

Cool.

And like I said before, I would much appreciate any benchmark numbers you come up with, very interested in comparing compiled code vs. handwritten code that does the same thing.

bootstraponline commented 12 years ago

I already have benchmarkjs setup to benchmark livepreview. Once I rebuild the new sundown.js, I'll compare it against markdown deep.

bootstraponline commented 12 years ago
emscript x 2,872 ops/sec ±1.23% (7 runs sampled)
pagedown x 544 ops/sec ±0.21% (7 runs sampled)
markdown deep x 3,583 ops/sec ±0.18% (7 runs sampled)
Fastest is markdown deep 

Markdown deep does not implement GitHub Flavored Markdown so the emscripten version is better for that use case.

kripken commented 12 years ago

Very interesting, thanks for the data!

Nice to see emscripten is generally comparable to a good handwritten version.

bootstraponline commented 12 years ago

I updated the test.

emscript x 2,846 ops/sec ±4.68% (8 runs sampled)
pagedown x 540 ops/sec ±0.37% (9 runs sampled)
markdown deep x 3,426 ops/sec ±0.23% (9 runs sampled)
Fastest is markdown deep
Emscripten is 527% faster vs pagedown
Emscripten is 120% slower vs markdown deep

The code works now so this issue is resolved. Thanks for your help.

bootstraponline commented 12 years ago

The benchmark was giving inconsistent results so I increased the amount of input data. Emscripten sundown is now consistently ~2.6x slower than markdown deep and much faster than pagedown.


emscript x 1.67 ops/sec ±5.52% (13 runs sampled)
pagedown x 0.30 ops/sec ±8.01% (6 runs sampled)
markdown deep x 4.44 ops/sec ±4.07% (27 runs sampled)
Fastest is markdown deep
Emscripten is 554% faster vs pagedown
Emscripten is 267% slower vs markdown deep
kripken commented 12 years ago

Do you perhaps see a performance warning in the error console? Large data might lead to enlarging memory arrays for example (which would be warned about).

Btw, what's the difference between pagedown and markdown deep? There's a huge performance difference between them. Do they parse the same stuff?

bootstraponline commented 12 years ago

There are no performance warnings.

They both parse the same markdown.

kripken commented 12 years ago

I see, thanks.