Closed ziz closed 12 years ago
If you can provide the relevant C++ source code that is compiled into the problem, that would also be very helpful. Without that, it's hard to know what to focus on here.
My attempt at a minimal case was not successful in reproducing the bug: http://batcave.textfiles.com/ziz/staging/casttest.zip
As requested, here's the .js and .bc files:
http://batcave.textfiles.com/ziz/staging/messtiny-10.js.gz http://batcave.textfiles.com/ziz/staging/messtiny-10.bc.gz
and the source code we're compiling is found in https://github.com/ziz/jsmess/tree/no_cothreads
So far as I can tell, the infinite loop happens during execution of a dynamic_cast call which runs as part of src/emu/clifront.c cli_frontend::listmedia, line 698.
Compiling fully from source requires a second, clean copy of the MESS source to build the external tools the build process depends on:
cd mess-orig; cp -rp src/osd/osdmini src/mess/osd; make TARGET=mess SUBTARGET=tiny
then build the MESS project:
make clean; find . -name \*.a.bc -o -name \*.o -delete; make clean; make TARGET=mess SUBTARGET=tiny
# Previous command fails because it can't run the external tools, copy them in
cp ../mess-orig/obj/osdmini/messtiny64/build/* obj/osdmini/messtiny/build/
# and finish the build
make TARGET=mess SUBTARGET=tiny
Then link with emscripten (since the build process tries to link .a instead of .a.bc, replace all .a with .a.bc in the link command):
/home/ziz/Dev/llvm-3.0-release/Release/bin/llvm-ld -disable-opt obj/osdmini/messtiny/version.o obj/osdmini/messtiny/drivlist.o obj/osdmini/messtiny/emu/drivers/emudummy.o obj/osdmini/messtiny/mess/drivers/coleco.o obj/osdmini/messtiny/mess/machine/coleco.o obj/osdmini/messtiny/libosd.a.bc obj/osdmini/messtiny/libcpu.a.bc obj/osdmini/messtiny/libemu.a.bc obj/osdmini/messtiny/libdasm.a.bc obj/osdmini/messtiny/libsound.a.bc obj/osdmini/messtiny/libutil.a.bc obj/osdmini/messtiny/libexpat.a.bc obj/osdmini/messtiny/libsoftfloat.a.bc obj/osdmini/messtiny/libformats.a.bc obj/osdmini/messtiny/libz.a.bc obj/osdmini/messtiny/libocore.a.bc -o=messtiny
and then emscripten:
../emscripten/emscripten.py messtiny.bc -o messtiny.js
Run with the argument '-listmedia'.
I'll look at this very soon, I just need to finish a few more compiler optimizations to speed up building large files (which will help with compiling and debugging code like this).
The compressed js file linked here doesn't seem to extract to a js file properly. I built my own though.
Just trying to run it, I ran into a few problems:
We should really try to reduce the size of the compiled code - hopefully we don't need everything currently built. The compiled JS is 50MB, which is too much for node and spidermonkey. d8 can run it, but d8 has other limitations like poor support for typed arrays which will greatly limit our ability to test (we will likely need typed arrays for speed).
But the real blocker now is to re-implement the necessary functions mentioned before in a way that can actually be used by us.
I have updated the jsmess makefile to use emcc and friends, and compiled with the latest emscripten. The JS is only 30M now, and compiles in less than half the time it used to - grats on the improvements!
There appears to be a minor bug in parseTools.js, preventing out-of-the-box compilation here: the 64-bit basic integer ops switch (around line 1660) is missing srem (compared to the 32-bit basic integer ops).
I may have recompiled spidermonkey with an increased functions limit to get around the OOM before. Pending a real fix, this is certainly an option.
Here's a new .bc and .js: http://batcave.textfiles.com/ziz/staging/messtiny-11.tgz
How did you work around the lack of srem?
The problem is that 64-bit math isn't really possible in JS. We can add srem, but it might fail if values too big to fit in a double are used. Perhaps there is a way to make jsmess use 32-bit values over 64-bit ints?
I applied the wrong-but-obvious solution: I just added srem to that case statement, which caused it to compile.
There may be an obvious solution that I'm missing to force 32-bit compilation for jsmess on OS X or other 64-bit platforms (and, so far as I can tell, this won't cause any problems if we do manage to force 32-bit compilation).
I'm having trouble building this .ll file, not sure why. What version of LLVM did you use to build it?
LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), which were the revisions I believe you reported you were using in a previous discussion on IRC. If you're using a different version, I'm happy to switch.
3.0 should be fine.
Another question, I get warnings about different targets (os x vs linux). Can you perhaps run the automatic test suite, to see if there are any problems on os x? (Emscripten has not been heavily tested there I am afraid.) The command to run them is python tests/runner.py
, it should take several hours.
Sure, I'll run the test suite. Is this the first time the different targets problem has shown up? I don't recall it being an issue before.
We do have people using OS X, and someone said things were working on Windows. But only Linux gets full test coverage all the time (because I run Linux, basically, no special reason).
Specifically for here, I worry that clang will implement dynamic cast differently on different platforms, and that that might be what is tripping us up here. A problem like that might have gone unnoticed even though we have some people using OS X for some projects.
Test output: https://gist.github.com/87ad7200ca6f708e6a73
Failures specifically related to closure (Invalid or corrupt jarfile /usr/local/bin/closure
) are because I had the config file pointed to a closure compiler wrapper .sh rather than the closure .jar. I can rerun those specific tests if needed.
A lot of those errors will, I suspect, be fixed by pull 154. And I am hoping some will be fixed by a headers correctness fix we landed yesterday. But if not, then those errors look very dangerous, and could possibly explain the jsmess infinite loop. Let's focus on the shortest one, pystruct. Can you please run
EMCC_DEBUG=1 python tests/runner.py test_pystruct
and gist the results? There will be an .ll and .js file in /tmp/emscripten_tmp (assuming /tmp is what TEMP_DIR is set to in ~/.emscipten).
edit: and please make sure you pull the latest code, since the headers fix was just yesterday
Pulled to f6e838357b58559ac376c7975f64b87da5a52b0a and ran test_pystruct.
Output:
https://gist.github.com/e81dba0c90b5589e0f79
Resulting temp files:
Thanks!
Ok, it looks like we need to be more assertive in telling clang to generate platform-independent code. Long-term we will want to have a formal "emscripten" llvm target, but until then, let's try to use 32-bit linux as the uniform target. I pushed this to the incoming branch as 4cab9f5a5ac72e9eaf904ace0ebe29d5e30ba9b6. Can you test it and see if it fixes pystruct?
Success! The incoming branch passes pystruct:
Great! :)
There's a chance it will fix the other tests too (although we might still need the bitcode fix in pull #154). Can you run the rest as well?
Some of the other tests are fixed; test_emcc is not apparently available in the incoming branch, but of the others, these now pass:
test_cubescript test_files test_libcxx test_pystruct
and these continue to fail:
test_freetype test_lua test_openjpeg test_poppler test_python test_thebullet test_zlib
test/runner.py output here:
That pull has just been merged to incoming, so hopefully all tests will now pass if you pull the latest incoming. (Note that it isn't in master yet, waiting on automatic tests.)
test_emcc is in "other", so to run it separately you need python tests/runner.py other.test_emcc
. But if you run the whole suite, it should be run.
freetype, poppler, openjpeg and bullet should hopefully be fixed with that pull. The others do not look like they will be fixed by it.
Can you gist the output from EMCC_DEBUG=1 python tests/runner.py test_zlib
?
I'm just running the tests that failed at the moment, to avoid the 2.5-hour test run. I'll rerun the whole suite when we've finished poking at individual tests, of course.
Looks like we're still failing on the collection of tests listed in my last message.
other.test_emcc is also failing; here's the EMCC_DEBUG=1 output.
On my machine it takes more than twice that, heh, I usually leave it to run overnight ;)
Based on the stack trace, on that part of the emcc test it is trying to run lli (the LLVM interpreter) on a bitcode file. Can you put up the generated bitcode files in that directory? (suffix .o and .bc)
Sure thing, here's the output from EMCC_DEBUG=1 python tests/runner.py test_zlib: http://batcave.textfiles.com/ziz/staging/test_zlib.tar.gz (it's too large for gist)
http://batcave.textfiles.com/ziz/staging/test_emcc-bitcode.tgz is the generated bitcode from EM_SAVE_DIR=1 python tests/runner.py other.test_emcc
The generated bitcode files work fine here. Do they work for you when you run them manually? lli hello_world.o
I am baffled by the zlib failure. The output is quite different, despite our using the same LLVM and Clang (3.0), and the same target.
Nope, the .o fails when I run it manually with lli. I see 'Illegal instruction: 4' in the terminal, and the following crash report comes up: https://gist.github.com/e1d98bb52adae5779449
Do both the normal and the "cleaned" .o files crash in that way? (the cleaned version removes debug info, which used to crash lli in the past)
Let's try to use exactly the same LLVM version, because I think we'll need to file an LLVM bug or post to their mailing list soon, not sure what else to do. I'm rebuilding LLVM 3.0 release from source now.
Yep, they both crash in the same way (including extremely similar crash reports - I haven't diffed them yet, though).
Just let me know what you need me to do. You're changing to make sure you're at LLVM 3.0 (svn revision 139974) and clang 3.0 (svn revision 139974), the versions I'm currently using - is that correct? Or should I switch LLVM / clang revs?
I think we should both use LLVM 3.0 release, from http://llvm.org/releases/3.0/llvm-3.0.tar.gz , since it looks like we will need to file a bug in LLVM if it still crashes for you.
Okay. I have been building LLVM with configure --enable-optimized --disable-assertions
- should we continue with that or change the options?
I just build it with cmake ..
, no flags. I don't think it would make a difference, but probably best to use exactly the same build command.
Btw, I just pushed another headers fix to incoming - hopefully the last. It might help with some of the errors, but I am not sure.
I have now built LLVM with cmake ../llvm-3.0; make
; this appears to build a no-optimizations build. other.test_emcc still fails (apparently in the same way); I'll upload the various files in an hour or so.
(whoops, sorry, forgot to upload them) http://batcave.textfiles.com/ziz/staging/test_emcc.tgz
Ok, it looks like there is a LLVM problem here on OS X. I will try to find time soon to boot into OS X and debug this in depth, because I don't have any other obvious ideas right now.
Meanwhile, to keep jsmess moving forward, is there any chance you can run Linux, maybe in a VM, as a workaround for this?
Yep, spinning it up on a linux box (2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64 GNU/Linux) now. Hopefully the tests will pass without issue there!
Test output on the linux machine:
http://batcave.textfiles.com/ziz/staging/testoutput-20120113.txt
It looks like the version of spidermonkey used there is too old, causing most of those errors. Best to either install the latest, or remove it from ~/.emscripten and just use node.
Ok, removing spidermonkey in favor of node: http://batcave.textfiles.com/ziz/staging/testoutput-20120114.txt (still a couple of failures, it looks like)
I don't understand those linking errors. That stuff should work with LLVM 3.0.
Can you run EM_SAVE_DIR=1 python tests/runner.py test_bullet
, that will save the files in /tmp/emscripten_tmp
. You can then try to manually run llvm-link and see if it works when run that way (might add a printout in tools/shared in link(), right before it fails, to print out the params it passes to llvm link).
I want to build jsmess myself to try to push things forward. Are there instructions somewhere? (I don't see any in https://github.com/ziz/jsmess )
Compilation instructions, such as they are, are in the wiki (https://github.com/ziz/jsmess/wiki).
Should I use the no_coroutines branch?
How do I get the "clean" build? Can I use the jsmess code, with some other flag so it builds natively? Or do I need to get the MESS code from somewhere?
no_coroutines is the right branch.
You can use the master branch (unmodified mess code) to make the clean build. Copy the checkout elsewhere, checkout master, and then running make TARGET=mess SUBTARGET=tiny
should build the native version. (If you're on a 32-bit platform you'll need to adjust the path you copy the native tools from, of couse.)
The makefile process for jsmess should be modified to allow native building at some point, of course, but it's still a moving target, so...
Thanks, I'll try to build this later today.
I built natively, built for JS until the error, and then tried to copy the native object files. However, there are no files in ../jsmess-native/obj/osdmini/messtiny/build/
(the directory exists, but it is empty). Looks like there are some relevant files in the native build in /sdl/
instead of `/osdmini/'. I'm confused why the native build seems to use sdl and the js build osdmini (assuming that's what's happening here)?
Those native tools in the sdl directory should work just as well. Looks like I may have forced building with osdmini rather than 'whatever the native build wants to use' when I originally wrote up those instructions.
(osdmini is a minimal, mostly-nonfunctional output driver, I believe, to reduce complexity while we're trying to get jsmess working at all)
That gets me a bit further, but now it is looking for emu/layout/pinball.lh
, which doesn't exist in the native build anywhere.
Maybe we should try to build the whole thing, it might be more feasible with the recent optimizations in emscripten? Or perhaps MAME instead of MESS? It's important to be building the same thing in native and JS apparently (and it's good for testing too, later on).
The emu/layout/pinball.lh should be generated from src/emu/layout/pinball.lay during the build process (and my native checkout does not have src/emu/layout/pinball.lay, and doesn't seem to be having any problems - looks like I'm using an svn checkout for the native tools, though:
Repository Root: svn://messdev.no-ip.org/mess Revision: 13168
As far as testing, I wholeheartedly agree (and we need to be able to build native and JS side-by-side for there to be any hope of having this be maintainable, too), but all I've been trying to do so far is get something at all working; when we get far enough that it makes sense, we can look at the changes that needed to be made in an uncoordinated fashion and come up with a coordinated way to do it.
dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."
The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).
(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)