Closed bootstraponline closed 12 years ago
If I remove whitespace from sundown_o2_m0.js
with closure then there's no error. The errors are only when minify
is not set to 0.
java -jar compiler.jar --compilation_level WHITESPACE_ONLY --js sundown_o2_m0.js --js_output_file sundown_o2_wo.js
https://github.com/bootstraponline/livepreview/commit/3090fc6fe8f6a01782c476a8b0e3e087b32ca7ac
All --minify
does is tell closure to minify whitespace or not. The likely explanation for this problem is a bug in closure, it might minify whitespace in a way that generates invalid code, and the next processing step then fails.
Run with EMCC_DEBUG=1
and look at the closure step saved to /tmp/emscripten_temp
. See if a JS engine is willing to parse that as valid.
Are you using the closure compiler bundled with emscripten, or another?
I downloaded the latest closure compiler and that's what I'm invoking to run whitespace_only. Could the bundled emcc have an out of date closure with a bug?
Is there a way to tell emcc to use my closure instead of the bundled version? It doesn't produce any errors when running with the latest jar.
You can define CLOSURE_COMPILER in ~/.emscripten to have the path to a different closure.
It seems emscripten is bundling an outdated closure jar. Is there any reason not to upgrade to the latest?
Nothing except for making sure that the new one passes all our test suite and does not regress benchmarks.
I updated emscripten and now even emcc -O2 --minify 0
produces broken builds using the bundled closure. If I revert back to an old emscripten version, it works on most inputs and crashes on others. It seems there are a bunch of hard to debug errors in the translated code.
How are the builds now broken after you updated? Are they broken with --closure 0
too?
There are fatal errors either on startup or once the text starts being entered depending on the options. Even builds that I thought were good break with certain input. The builds were always broken, I just didn't notice until I found better sample data.
Edit: Yes they are broken with --closure 0 also.
emcc -O2 --closure 0 sundown.bc -o
sundown_o2_c0.jsIf you get unexpected behavior, best is to make a testcase that can be built natively and to js, and then debug the difference,
https://github.com/kripken/emscripten/wiki/Debugging
I can help debugging such a testcase too.
The gcc version works properly on the same input, only the emscripten version fails. It was faster for me to rewrite the entire system in JavaScript instead of figuring out what the bugs are in emscripten.
I would still like to fix the bugs we have. Any chance you can point me to the relevant code and build instructions, and I'll debug the difference between gcc and emscripten on that code?
The code is in the master branch of bootstraponline/sundown
./js.sh
builds the emscripten library./native.sh
builds the native library and a test executableThe easy way to test the JavaScript version is to replace sundown_o2_c0.js
in livepreview_o2 and then load index.html
. I have passenger installed so passenger start
works, however any server will do.
native.sh
will build the native test exe, invoke it with the sample data, and save the results to output.html
Let me know if there's anything I can do to help.
Thanks, I'll take a look now.
Btw, since you rewrote the whole thing by hand in JS, what performance are you seeing when you compare the two?
Ok, I forked your repo and added a commit to generate a test.js that is an exact parallel to the native test it generates, for easy comparison,
https://github.com/kripken/sundown/commit/354c62d7b4c7d503693c6247b365faecd8804118
Running those on data.md (which looks like the same as the testcase you mention from before as being broken in js - are there other inputs I should test too?), I get identical results in js and native, so I can't reproduce the problem at that level.
Can you perhaps try to run ./js and ./native in my fork and see if you get the same results? If not, there might be a configuration problem with emscripten. Or if the js output is still broken for you, then I can next look and see if there is a problem in the wrapper code (which test.js does not use).
Btw, since you rewrote the whole thing by hand in JS, what performance are you seeing when you compare the two?
I rewrote the markdown parser integration which is just a few lines of code. The underlying engine was switched out for a modified markdowndeep. Based on the published benchmarks I expect the speed to be about the same. markdowndeep is almost half the size (62KB vs 103 KB) and doesn't require typed arrays. Even pagedown, one of the slowest parsers, is still fast enough to update in real time as text is entered.
The only problem with the emscripten version of Sundown is that users would report certain input crashes, I'd reproduce the crash, and then be unable to debug the issue. With a JS based parser it's easy for me to fix problems. The other parsers do not implement all of GitHub Flavored Markdown though which is why I initially tried to use emscripten.
I've pushed a collection of md files that lead to crashes on github.
I will try your fork and will let you know. Thanks for your help.
Using node I get the same result which is encouraging. _str_to_html doesn't seem to be exported when included in LDFLAGS.
LDFLAGS=-g -Wall -Werror -s EXPORTED_FUNCTIONS="['_main', '_malloc', '_free', '_str_to_html', '_realloc']"
If I edit settings.js
then it's detected.
var EXPORTED_FUNCTIONS = ['_main', '_malloc', '_free', '_str_to_html', '_realloc'];
The input is correct on the hosted page and I can't reproduce the crash. I only made small changes to the code so I'm not sure why it works now.
I found one error:
μ†ℱ ╋ℯ╳╋
turns into <p>μ†ℱ ╋ℯ╳╋</p>
using ./native.sh
however ./js.sh
produces <p>ᅫᄐ¬タᅠ¬トᄆ ¬ユヒ¬トᆵ¬ユᄈ¬ユヒ</p>
LDFLAGS is not relevent for exporting. All -s
commands, including -s EXPORTED_FUNCTIONS
, affect compilation to js. So they matter when you convert the bitcode to js in the last step, not the previous step of linking.
About that non-ascii error, to be honest I am not sure anyone has tried non-ascii in emscripten ;) We might lack some unicode support, or perhaps node lacks it.' I'll investigate.
Is there a way to pass EXPORTED_FUNCTIONS on the command line or in the Makefile instead of editing settings.js
?
emcc: warning: treating -s as linker option and not as -s OPT=VALUE for js compilation
Yes, just add -s KEY=VALUE when calling emcc in the command that generates a .js file. If you get that warning in that command, show me the entire command you are running.
Regarding the non-ascii issue, I don't know enough UTF to fix it. But the raw characters in "μ†ℱ" from your example start with 206 188 226 128 in binary, but I don't know how to get from that to a JS string that will print out as the same. For example, "μ".charCodeAt(0) = 956
and I don't see how to relate 956 to the binary numbers from before. Maybe one is UTF-8 and the other UTF-16 or something like that.
Thanks. That worked.
Markdowndeep handles UTF8 properly. I think the emscripten code is mangling encoding on the characters. Everything should be UTF8.
'╋'.charCodeAt(0)
9547
Turns into: ¬ユヒ<
'¬ユヒ'.charCodeAt(0)
65506
'¬ユヒ'.charCodeAt(1)
65429
'¬ユヒ'.charCodeAt(2)
65419
So, '╋\n' in a C string turns into 226 149 139 10
. Is that UTF 8? Anyhow, this is what the JS will see. I have no immediate idea how to turn the first three of those numbers into 9547
which is what we need for JS to print ╋...
Java reports the same value as JavaScript.
System.out.println((int) "╋".charAt(0)); // 9547
226 149 139 10
is UTF-8.
byte[] b = new byte[]{ (byte) 226, (byte) 149, (byte) 139, 10 };
System.out.println(new String(b)); // ╋
Here's the solution.
// javascript
Utf8.decodeArray([226,149,139,10]) //╋
Thanks!
Ok, with guidance from that I implemented this in the incoming branch and added a test.
I checked out incoming and node test.js data.md > output.js.html
now produces the correct result. In the webpage though it's still incorrect. The markdowndeep version works.
μ†ℱ ╋ℯ╳╋
becomes ¼ 1 K/sK
Did you override the standard output stream? If not, can you make a -O2 --closure 0
build (so I can debug it)?
I didn't override standard out. I'm running ./js.sh
in sundown2.
sundown.js built with -O2 --closure 0
--closure 0 doesn't seem to have disabled minification. Edit: never mind, it was loading from cache even though I have cache set to disabled.
To debug this I need a local copy I can modify. I ran ./js.sh
in that repo, but I guess it doesn't have any of the other necessary files to recreate the web version that fails for you? Can you perhaps make a zip file with a complete build?
Or, maybe first link me to the glue code you use to convert the compiled code's output into a native JS string, I think I might know what we need to fix.
./js.sh
just produces sundown.js which is then inserted into livepreview_o2.
I think Pointer_stringify
is messing it up. Here's a zip that can run locally.
This is the glue code from livepreview.js:
// text is: μ†ℱ ╋ℯ╳╋
writeStringToMemory( text, pointer );
text = Pointer_stringify( _str_to_html( pointer ) );
// text is now: <p>¼ 1 K/sK</p>
Thanks, yeah, that was what I just realized must be it. We need Pointer_stringify to be utf8-aware, not just stdout.
This should now be fixed on incoming.
That's better, although not quite fixed.
젱 K/sK
The problem must be in the input - we give writeStringToMemory a JS UTF string, and need to convert it into a C UTF string - the opposite of what we do in Pointer_stringify. Now we just need the reverse formula for the previous one...
Here's the solution.
var b = encode_utf8('╋');
b.charCodeAt(0);
226
b.charCodeAt(1);
149
b.charCodeAt(2);
139
I added the UTF-8 code and now the webpage version works correctly.
Thanks!
I also implemented this in incoming, so just using writeStringToMemory
should work now too.
Thanks! I'll test it out and will let you know how it works.
Cool.
And like I said before, I would much appreciate any benchmark numbers you come up with, very interested in comparing compiled code vs. handwritten code that does the same thing.
I already have benchmarkjs setup to benchmark livepreview. Once I rebuild the new sundown.js, I'll compare it against markdown deep.
emscript x 2,872 ops/sec ±1.23% (7 runs sampled)
pagedown x 544 ops/sec ±0.21% (7 runs sampled)
markdown deep x 3,583 ops/sec ±0.18% (7 runs sampled)
Fastest is markdown deep
Markdown deep does not implement GitHub Flavored Markdown so the emscripten version is better for that use case.
Very interesting, thanks for the data!
Nice to see emscripten is generally comparable to a good handwritten version.
I updated the test.
emscript x 2,846 ops/sec ±4.68% (8 runs sampled)
pagedown x 540 ops/sec ±0.37% (9 runs sampled)
markdown deep x 3,426 ops/sec ±0.23% (9 runs sampled)
Fastest is markdown deep
Emscripten is 527% faster vs pagedown
Emscripten is 120% slower vs markdown deep
The code works now so this issue is resolved. Thanks for your help.
The benchmark was giving inconsistent results so I increased the amount of input data. Emscripten sundown is now consistently ~2.6x slower than markdown deep and much faster than pagedown.
emscript x 1.67 ops/sec ±5.52% (13 runs sampled)
pagedown x 0.30 ops/sec ±8.01% (6 runs sampled)
markdown deep x 4.44 ops/sec ±4.07% (27 runs sampled)
Fastest is markdown deep
Emscripten is 554% faster vs pagedown
Emscripten is 267% slower vs markdown deep
Do you perhaps see a performance warning in the error console? Large data might lead to enlarging memory arrays for example (which would be warned about).
Btw, what's the difference between pagedown and markdown deep? There's a huge performance difference between them. Do they parse the same stuff?
There are no performance warnings.
text = text.replace(/rgx/)
which is nice because it's easy to maintain and works everywhere. The downside is it's ridiculously slow.They both parse the same markdown.
I see, thanks.
Works.
emcc -O2 --minify 0 -s ASSERTIONS=1 sundown.bc -o sundown_o2_m0_a1.js
Fails.
emcc -O2 -s ASSERTIONS=1 sundown.bc -o sundown_o2_a1.js
This test case reported on Gollum #415 causes a crash when run against a build with
emcc -O2 sundown.bc -o sundown_o2.js
.Is there a way to debug why minify is creating invalid code?