emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.64k stars 3.29k forks source link

Build MESS/MAME #131

Closed ziz closed 12 years ago

ziz commented 12 years ago

dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."

The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).

(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)

kripken commented 12 years ago

If you can provide the relevant C++ source code that is compiled into the problem, that would also be very helpful. Without that, it's hard to know what to focus on here.

ziz commented 12 years ago

My attempt at a minimal case was not successful in reproducing the bug: http://batcave.textfiles.com/ziz/staging/casttest.zip

As requested, here's the .js and .bc files:

http://batcave.textfiles.com/ziz/staging/messtiny-10.js.gz http://batcave.textfiles.com/ziz/staging/messtiny-10.bc.gz

and the source code we're compiling is found in https://github.com/ziz/jsmess/tree/no_cothreads

So far as I can tell, the infinite loop happens during execution of a dynamic_cast call which runs as part of src/emu/clifront.c cli_frontend::listmedia, line 698.

Compiling fully from source requires a second, clean copy of the MESS source to build the external tools the build process depends on:

cd mess-orig; cp -rp src/osd/osdmini src/mess/osd; make TARGET=mess SUBTARGET=tiny

then build the MESS project:

make clean; find . -name \*.a.bc -o -name \*.o -delete; make clean; make TARGET=mess SUBTARGET=tiny
# Previous command fails because it can't run the external tools, copy them in
cp ../mess-orig/obj/osdmini/messtiny64/build/* obj/osdmini/messtiny/build/
# and finish the build
make TARGET=mess SUBTARGET=tiny

Then link with emscripten (since the build process tries to link .a instead of .a.bc, replace all .a with .a.bc in the link command):

/home/ziz/Dev/llvm-3.0-release/Release/bin/llvm-ld -disable-opt obj/osdmini/messtiny/version.o obj/osdmini/messtiny/drivlist.o obj/osdmini/messtiny/emu/drivers/emudummy.o obj/osdmini/messtiny/mess/drivers/coleco.o obj/osdmini/messtiny/mess/machine/coleco.o obj/osdmini/messtiny/libosd.a.bc obj/osdmini/messtiny/libcpu.a.bc obj/osdmini/messtiny/libemu.a.bc obj/osdmini/messtiny/libdasm.a.bc obj/osdmini/messtiny/libsound.a.bc obj/osdmini/messtiny/libutil.a.bc obj/osdmini/messtiny/libexpat.a.bc obj/osdmini/messtiny/libsoftfloat.a.bc obj/osdmini/messtiny/libformats.a.bc obj/osdmini/messtiny/libz.a.bc obj/osdmini/messtiny/libocore.a.bc -o=messtiny

and then emscripten:

../emscripten/emscripten.py messtiny.bc -o messtiny.js

Run with the argument '-listmedia'.

kripken commented 12 years ago

I'll look at this very soon, I just need to finish a few more compiler optimizations to speed up building large files (which will help with compiling and debugging code like this).

kripken commented 12 years ago

The compressed js file linked here doesn't seem to extract to a js file properly. I built my own though.

Just trying to run it, I ran into a few problems:

  1. spidermonkey and node fail on OOM.
  2. v8 shell gets farther, but fails on lack of bsearch. bsearch was implemented in pull request 114, however the submitter there was not willing to license the code in a way that we can use. So I guess we would need to reimplement that function and the other ones there from scratch.

We should really try to reduce the size of the compiled code - hopefully we don't need everything currently built. The compiled JS is 50MB, which is too much for node and spidermonkey. d8 can run it, but d8 has other limitations like poor support for typed arrays which will greatly limit our ability to test (we will likely need typed arrays for speed).

But the real blocker now is to re-implement the necessary functions mentioned before in a way that can actually be used by us.

ziz commented 12 years ago

I have updated the jsmess makefile to use emcc and friends, and compiled with the latest emscripten. The JS is only 30M now, and compiles in less than half the time it used to - grats on the improvements!

There appears to be a minor bug in parseTools.js, preventing out-of-the-box compilation here: the 64-bit basic integer ops switch (around line 1660) is missing srem (compared to the 32-bit basic integer ops).

I may have recompiled spidermonkey with an increased functions limit to get around the OOM before. Pending a real fix, this is certainly an option.

Here's a new .bc and .js: http://batcave.textfiles.com/ziz/staging/messtiny-11.tgz

kripken commented 12 years ago

How did you work around the lack of srem?

The problem is that 64-bit math isn't really possible in JS. We can add srem, but it might fail if values too big to fit in a double are used. Perhaps there is a way to make jsmess use 32-bit values over 64-bit ints?

ziz commented 12 years ago

I applied the wrong-but-obvious solution: I just added srem to that case statement, which caused it to compile.

There may be an obvious solution that I'm missing to force 32-bit compilation for jsmess on OS X or other 64-bit platforms (and, so far as I can tell, this won't cause any problems if we do manage to force 32-bit compilation).

kripken commented 12 years ago

I'm having trouble building this .ll file, not sure why. What version of LLVM did you use to build it?

ziz commented 12 years ago

LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), which were the revisions I believe you reported you were using in a previous discussion on IRC. If you're using a different version, I'm happy to switch.

kripken commented 12 years ago

3.0 should be fine.

Another question, I get warnings about different targets (os x vs linux). Can you perhaps run the automatic test suite, to see if there are any problems on os x? (Emscripten has not been heavily tested there I am afraid.) The command to run them is python tests/runner.py, it should take several hours.

ziz commented 12 years ago

Sure, I'll run the test suite. Is this the first time the different targets problem has shown up? I don't recall it being an issue before.

kripken commented 12 years ago

We do have people using OS X, and someone said things were working on Windows. But only Linux gets full test coverage all the time (because I run Linux, basically, no special reason).

Specifically for here, I worry that clang will implement dynamic cast differently on different platforms, and that that might be what is tripping us up here. A problem like that might have gone unnoticed even though we have some people using OS X for some projects.

ziz commented 12 years ago

Test output: https://gist.github.com/87ad7200ca6f708e6a73

Failures specifically related to closure (Invalid or corrupt jarfile /usr/local/bin/closure) are because I had the config file pointed to a closure compiler wrapper .sh rather than the closure .jar. I can rerun those specific tests if needed.

kripken commented 12 years ago

A lot of those errors will, I suspect, be fixed by pull 154. And I am hoping some will be fixed by a headers correctness fix we landed yesterday. But if not, then those errors look very dangerous, and could possibly explain the jsmess infinite loop. Let's focus on the shortest one, pystruct. Can you please run

EMCC_DEBUG=1 python tests/runner.py test_pystruct

and gist the results? There will be an .ll and .js file in /tmp/emscripten_tmp (assuming /tmp is what TEMP_DIR is set to in ~/.emscipten).

edit: and please make sure you pull the latest code, since the headers fix was just yesterday

ziz commented 12 years ago

Pulled to f6e838357b58559ac376c7975f64b87da5a52b0a and ran test_pystruct.

Output:

https://gist.github.com/e81dba0c90b5589e0f79

Resulting temp files:

https://gist.github.com/93358b0ae60cb27fb293

kripken commented 12 years ago

Thanks!

Ok, it looks like we need to be more assertive in telling clang to generate platform-independent code. Long-term we will want to have a formal "emscripten" llvm target, but until then, let's try to use 32-bit linux as the uniform target. I pushed this to the incoming branch as 4cab9f5a5ac72e9eaf904ace0ebe29d5e30ba9b6. Can you test it and see if it fixes pystruct?

ziz commented 12 years ago

Success! The incoming branch passes pystruct:

https://gist.github.com/4e286d64ccfd24e5af82

kripken commented 12 years ago

Great! :)

There's a chance it will fix the other tests too (although we might still need the bitcode fix in pull #154). Can you run the rest as well?

ziz commented 12 years ago

Some of the other tests are fixed; test_emcc is not apparently available in the incoming branch, but of the others, these now pass:

test_cubescript test_files test_libcxx test_pystruct

and these continue to fail:

test_freetype test_lua test_openjpeg test_poppler test_python test_thebullet test_zlib

test/runner.py output here:

https://gist.github.com/eaad86c149e5c81ca26e

kripken commented 12 years ago

That pull has just been merged to incoming, so hopefully all tests will now pass if you pull the latest incoming. (Note that it isn't in master yet, waiting on automatic tests.)

kripken commented 12 years ago

test_emcc is in "other", so to run it separately you need python tests/runner.py other.test_emcc. But if you run the whole suite, it should be run.

kripken commented 12 years ago

freetype, poppler, openjpeg and bullet should hopefully be fixed with that pull. The others do not look like they will be fixed by it.

Can you gist the output from EMCC_DEBUG=1 python tests/runner.py test_zlib?

ziz commented 12 years ago

I'm just running the tests that failed at the moment, to avoid the 2.5-hour test run. I'll rerun the whole suite when we've finished poking at individual tests, of course.

Looks like we're still failing on the collection of tests listed in my last message.

other.test_emcc is also failing; here's the EMCC_DEBUG=1 output.

https://gist.github.com/22eda2059ac6d3af2b7a

kripken commented 12 years ago

On my machine it takes more than twice that, heh, I usually leave it to run overnight ;)

Based on the stack trace, on that part of the emcc test it is trying to run lli (the LLVM interpreter) on a bitcode file. Can you put up the generated bitcode files in that directory? (suffix .o and .bc)

ziz commented 12 years ago

Sure thing, here's the output from EMCC_DEBUG=1 python tests/runner.py test_zlib: http://batcave.textfiles.com/ziz/staging/test_zlib.tar.gz (it's too large for gist)

ziz commented 12 years ago

http://batcave.textfiles.com/ziz/staging/test_emcc-bitcode.tgz is the generated bitcode from EM_SAVE_DIR=1 python tests/runner.py other.test_emcc

kripken commented 12 years ago

The generated bitcode files work fine here. Do they work for you when you run them manually? lli hello_world.o

I am baffled by the zlib failure. The output is quite different, despite our using the same LLVM and Clang (3.0), and the same target.

ziz commented 12 years ago

Nope, the .o fails when I run it manually with lli. I see 'Illegal instruction: 4' in the terminal, and the following crash report comes up: https://gist.github.com/e1d98bb52adae5779449

kripken commented 12 years ago

Do both the normal and the "cleaned" .o files crash in that way? (the cleaned version removes debug info, which used to crash lli in the past)

Let's try to use exactly the same LLVM version, because I think we'll need to file an LLVM bug or post to their mailing list soon, not sure what else to do. I'm rebuilding LLVM 3.0 release from source now.

ziz commented 12 years ago

Yep, they both crash in the same way (including extremely similar crash reports - I haven't diffed them yet, though).

Just let me know what you need me to do. You're changing to make sure you're at LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), the versions I'm currently using - is that correct? Or should I switch LLVM / clang revs?

kripken commented 12 years ago

I think we should both use LLVM 3.0 release, from http://llvm.org/releases/3.0/llvm-3.0.tar.gz , since it looks like we will need to file a bug in LLVM if it still crashes for you.

ziz commented 12 years ago

Okay. I have been building LLVM with configure --enable-optimized --disable-assertions - should we continue with that or change the options?

kripken commented 12 years ago

I just build it with cmake .., no flags. I don't think it would make a difference, but probably best to use exactly the same build command.

Btw, I just pushed another headers fix to incoming - hopefully the last. It might help with some of the errors, but I am not sure.

ziz commented 12 years ago

I have now built LLVM with cmake ../llvm-3.0; make; this appears to build a no-optimizations build. other.test_emcc still fails (apparently in the same way); I'll upload the various files in an hour or so.

ziz commented 12 years ago

(whoops, sorry, forgot to upload them) http://batcave.textfiles.com/ziz/staging/test_emcc.tgz

kripken commented 12 years ago

Ok, it looks like there is a LLVM problem here on OS X. I will try to find time soon to boot into OS X and debug this in depth, because I don't have any other obvious ideas right now.

Meanwhile, to keep jsmess moving forward, is there any chance you can run Linux, maybe in a VM, as a workaround for this?

ziz commented 12 years ago

Yep, spinning it up on a linux box (2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64 GNU/Linux) now. Hopefully the tests will pass without issue there!

ziz commented 12 years ago

Test output on the linux machine:

http://batcave.textfiles.com/ziz/staging/testoutput-20120113.txt

kripken commented 12 years ago

It looks like the version of spidermonkey used there is too old, causing most of those errors. Best to either install the latest, or remove it from ~/.emscripten and just use node.

ziz commented 12 years ago

Ok, removing spidermonkey in favor of node: http://batcave.textfiles.com/ziz/staging/testoutput-20120114.txt (still a couple of failures, it looks like)

kripken commented 12 years ago

I don't understand those linking errors. That stuff should work with LLVM 3.0.

Can you run EM_SAVE_DIR=1 python tests/runner.py test_bullet, that will save the files in /tmp/emscripten_tmp. You can then try to manually run llvm-link and see if it works when run that way (might add a printout in tools/shared in link(), right before it fails, to print out the params it passes to llvm link).

kripken commented 12 years ago

I want to build jsmess myself to try to push things forward. Are there instructions somewhere? (I don't see any in https://github.com/ziz/jsmess )

ziz commented 12 years ago

Compilation instructions, such as they are, are in the wiki (https://github.com/ziz/jsmess/wiki).

kripken commented 12 years ago

Should I use the no_coroutines branch?

How do I get the "clean" build? Can I use the jsmess code, with some other flag so it builds natively? Or do I need to get the MESS code from somewhere?

ziz commented 12 years ago

no_coroutines is the right branch.

You can use the master branch (unmodified mess code) to make the clean build. Copy the checkout elsewhere, checkout master, and then running make TARGET=mess SUBTARGET=tiny should build the native version. (If you're on a 32-bit platform you'll need to adjust the path you copy the native tools from, of couse.)

The makefile process for jsmess should be modified to allow native building at some point, of course, but it's still a moving target, so...

kripken commented 12 years ago

Thanks, I'll try to build this later today.

kripken commented 12 years ago

I built natively, built for JS until the error, and then tried to copy the native object files. However, there are no files in ../jsmess-native/obj/osdmini/messtiny/build/ (the directory exists, but it is empty). Looks like there are some relevant files in the native build in /sdl/ instead of `/osdmini/'. I'm confused why the native build seems to use sdl and the js build osdmini (assuming that's what's happening here)?

ziz commented 12 years ago

Those native tools in the sdl directory should work just as well. Looks like I may have forced building with osdmini rather than 'whatever the native build wants to use' when I originally wrote up those instructions.

(osdmini is a minimal, mostly-nonfunctional output driver, I believe, to reduce complexity while we're trying to get jsmess working at all)

kripken commented 12 years ago

That gets me a bit further, but now it is looking for emu/layout/pinball.lh, which doesn't exist in the native build anywhere.

Maybe we should try to build the whole thing, it might be more feasible with the recent optimizations in emscripten? Or perhaps MAME instead of MESS? It's important to be building the same thing in native and JS apparently (and it's good for testing too, later on).

ziz commented 12 years ago

The emu/layout/pinball.lh should be generated from src/emu/layout/pinball.lay during the build process (and my native checkout does not have src/emu/layout/pinball.lay, and doesn't seem to be having any problems - looks like I'm using an svn checkout for the native tools, though:

Repository Root: svn://messdev.no-ip.org/mess Revision: 13168

As far as testing, I wholeheartedly agree (and we need to be able to build native and JS side-by-side for there to be any hope of having this be maintainable, too), but all I've been trying to do so far is get something at all working; when we get far enough that it makes sense, we can look at the changes that needed to be made in an uncoordinated fashion and come up with a coordinated way to do it.