emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.81k stars 3.31k forks source link

Build MESS/MAME #131

Closed ziz closed 12 years ago

ziz commented 12 years ago

dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."

The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).

(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)

kripken commented 12 years ago

I'll try some more then.

What is the difference between MESS and MAME, btw? Isn't MAME smaller?

ziz commented 12 years ago

MESS and MAME use the same main engine, and have a different set of emulation targets. MAME is smaller, but we're only (currently) compiling one emulation target in MESS (the coleco hardware), so the size is mostly immaterial.

kripken commented 12 years ago

Whoops, hit the wrong button, ignore that.

DopefishJustin commented 12 years ago

The MESS source tree is larger than MAME's, but only because it's based on MAME and includes a copy-paste of the full MAME tree for convenience. In terms of the amount of code actually compiled MESS is significantly smaller. But as ziz said it makes no difference when compiling only a single emulation target.

kripken commented 12 years ago

What is the mechanism for specifying a single emulation target? Modifying the makefile? Or a parameter when running make?

DopefishJustin commented 12 years ago

The SUBTARGET=tiny parameter to make points it at tiny.mak which needs to be hand-edited to include only the files necessary for the desired target system.

kripken commented 12 years ago

Thanks for the help everyone.

Ok, I set my goal to build mess in native and js in parallel, building the exact same thing. So I checked out the no-cothreads branch, then returned AR, CC etc. to use the native tools. I then had to also remove -Werror because that made it fail. It then built a native messtiny executable.

However, that executable hangs on startup (just running ./messtiny), so something seems to be wrong. This didn't happen in the main branch. Perhaps one of the commits in no-cothreads is problematic?

I suggest that the next step is to get it to build natively in the no-cothreads branch. That would allow us to build native and js in parallel, which makes it much easier to figure out problems in js.

DopefishJustin commented 12 years ago

That may be expected if it is built with OSD=osdmini since mess with no parameters will try to display a graphical menu, and if the graphics output is stubbed that won't do anything. Does ./messtiny -help work?

ziz commented 12 years ago

Here's the changes I made to the makefile to get make TARGET=mess SUBTARGET=tiny to compile natively on a 64-bit linux box: ziz/jsmess@c0ac13b37b3406af23ebbfca578092521d9c32de

I think they're all pretty minor as far as these things go.

This in fact works as I expect; all of the following commands work.

  553  ./messtiny64 -help
  554  ./messtiny64 -listmedia
  555  ./messtiny64 -showusage
  556  ./messtiny64 -listslots
  557  ./messtiny64 -listdevices

and ./messtiny64 by itself appears to hang, for the reason described above.

kripken commented 12 years ago

Ok, cool.

I built for js, to try to compare. I built the no-cothreads branch, just changing the paths to emscripten on my system. When native executables were missing, this fixed them:

cp obj/osdmini/messtiny/emu/*.fh ../jsmess/obj/osdmini/messtiny/emu/
cp obj/osdmini/messtiny/emu/layout/*.lh ../jsmess/obj/osdmini/messtiny/emu/layout/
cp obj/osdmini/messtiny/emu/layout/*.lay ../jsmess/obj/osdmini/messtiny/emu/layout/

However, I seem to be hitting an LLVM problem - it aborts on the final link of bitcode files, with

Broken module found, compilation aborted!

I narrowed this down from the original big link command to the linking of just two files, which is enough to trigger the error,

llvm-link obj/osdmini/messtiny/mess/drivers/coleco.o obj/osdmini/messtiny/libemu.a -o=messtiny

Before I file a bug with LLVM, can someone please confirm that they see it too?

Note that to build, you should use the incoming branch of emscripten, I had to push a fix for emar just now. (That branch will merge to master in a few hours once automatic tests pass.)

ziz commented 12 years ago

I'm not sure if this is relevant in your case, but it looks like I might need to change .a to .a.bc in the linking command because the .a is an llvm wrapper of some sort. I haven't build the js recently, though - I'll have to try that later.

kripken commented 12 years ago

That wrapper is what I fixed in incoming just now. It won't create the wrapper, the .a file will be the proper bitcode.

kripken commented 12 years ago

Emscripten incoming just merged to master, so if anyone wants to try to reproduce my llvm error, you can use normal emscripten (updated to latest).

ziz commented 12 years ago

With latest incoming, I got all the way through the linking step:

Linking messtiny...
/home/ziz/Dev/emscripten/emld -Wl,--warn-common -s  obj/osdmini/messtiny/version.o obj/osdmini/messtiny/drivlist.o obj/osdmini/messtiny/emu/drivers/emudummy.o obj/osdmini/messtiny/mess/drivers/coleco.o obj/osdmini/messtiny/mess/machine/coleco.o obj/osdmini/messtiny/libosd.a obj/osdmini/messtiny/libcpu.a obj/osdmini/messtiny/libemu.a obj/osdmini/messtiny/libdasm.a obj/osdmini/messtiny/libsound.a obj/osdmini/messtiny/libutil.a obj/osdmini/messtiny/libexpat.a obj/osdmini/messtiny/libsoftfloat.a obj/osdmini/messtiny/libformats.a obj/osdmini/messtiny/libz.a obj/osdmini/messtiny/libocore.a  -o messtiny

succeeded without issue.

However, when I then run emscripten, I get:

$ ../emscripten/emscripten.py -o messtiny.js messtiny.bc

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
TypeError: Cannot read property '1' of null
    at Object._lineSplitter [as processItem] (eval at globalEval (/home/ziz/Dev/emscripten/src/compiler.js:79:8))
    at Object.process (eval at globalEval (/home/ziz/Dev/emscripten/src/compiler.js:79:8))
    at Object.solve (eval at globalEval (/home/ziz/Dev/emscripten/src/compiler.js:79:8))
    at intertyper (eval at globalEval (/home/ziz/Dev/emscripten/src/compiler.js:79:8))
    at Object.<anonymous> (/home/ziz/Dev/emscripten/src/compiler.js:186:18)
    at Module._compile (module.js:432:26)
    at Object..js (module.js:450:10)
    at Module.load (module.js:351:31)
    at Function._load (module.js:310:12)
    at Array.0 (module.js:470:10)

Produced .bc is at http://batcave.textfiles.com/ziz/staging/messtiny.bc-20120201.gz

DopefishJustin commented 12 years ago

Finally got a proper environment for this set up and I get the same TypeError as ziz above.

To compile I did not need to do the

cp obj/osdmini/messtiny/emu/layout/*.lay ../jsmess/obj/osdmini/messtiny/emu/layout/

step, however I did need to do

cp obj/osdmini/messtiny64/drivlist.c ../jsmess/obj/osdmini/messtiny/
kripken commented 12 years ago

Hmm, so no one is seeing the LLVM error I run into? Odd. I won't file a bug then.

You should use emcc to compile, not emscripten.py. I'll add a warning to emscripten.py now.

With emcc, that bitcode file compiles over here, and it hangs on run - which is the expected problem I think?

DopefishJustin commented 12 years ago

Ah with emcc I get a functional messtiny.js. node messtiny.js hangs (expected) but node messtiny.js -help and node messtiny.js -listfull work. node messtiny.js -listmedia hangs after printing the headings which I think is the dynamic_cast issue that needed looking into.

ziz commented 12 years ago

emcc works to compile here as well, and produces a messtiny.js which works as expected (as DFJustin describes above).

kripken commented 12 years ago

Running emcc with -s LABEL_DEBUG=1, the generated code prints out each basic block of code as it enters it.

In node, it hangs on __ZN12memory_entry8allocateEjPvPKci:$for_cond. This looks like invalid behavior - only trivial operations happen in that very short basic block, and it immediately jumps to one of two other basic blocks, both of which print out something on entry, which isn't printed. This looks like a bug, if others can reproduce it we should file it.

In spidermonkey, we get past that hang, quite a lot further actually (confirming that the node hang is likely a bug). We do however eventually hang on __ZL34xml_get_attribute_float_with_substR15running_machineR14_xml_data_nodePKcf:$lor_lhs_false. This basic block contains a call to sscanf, so it is possible that our libc implementation of sscanf has a bug in it. Next step is to investigate the inputs to that function and debug why it hangs.

kripken commented 12 years ago

The latter hang is confirmed as a bug with sscanf on negative floats, will have a fix soon.

kripken commented 12 years ago

Fixed on emscripten's incoming branch. With that in place, I don't see an obvious hang near startup. It does run for a long time though (this is a debug build with LABEL_DEBUG which makes it even slower) so I can't be sure.

DopefishJustin commented 12 years ago

Running node messtiny.js -listmedia with the sscanf fix and -s LABEL_DEBUG=1, it seems to hang at __ZN8device_t9interfaceI21device_slot_interfaceEEbRPT_:$5

ziz commented 12 years ago

Confirmed (by way of a judicious couple of print statements) that node messtiny.js -listmedia is hung looping inside dynamic_cast (the second time it's called, not the first, apparently:

  function ___dynamic_cast(ptr, knownTI, attemptedTI, idunno) {
      var ptrTV = HEAP32[((ptr)>>2)];
      var count = HEAP32[((ptrTV)>>2)];
      ptrTV -= 4;
      var TI = HEAP32[((ptrTV)>>2)];
      do {
        print("" + ptr + " " + knownTI + " " + attemptedTI + " " + idunno + " " + ptrTV + " " + count + " " + TI + "\n")
        if (TI == attemptedTI) {print("gone 1"); return 1;}
        // Go to parent class
        var type_infoAddr = HEAP32[((TI)>>2)] - 8;
        var type_info = HEAP32[((type_infoAddr)>>2)];
        if (type_info == 1) {print("gone 0"); return 0;} // no parent class
        var TIAddr = TI + 8;
        var TI = HEAP32[((TIAddr)>>2)];
      } while (1);

      return 0;
    }

produces

$ ../node/bin/node messtiny.js -listmedia | head -40
 SYSTEM      MEDIA NAME (brief)   IMAGE FILE EXTENSIONS SUPPORTED     
----------  --------------------  ------------------------------------
5999976 5286012 5290132 -1 5247984 1722 5248108

5999976 5286012 5290132 -1 5247984 1722 5434080

5999976 5286012 5290132 -1 5247984 1722 5286012

5999976 5286012 5290132 -1 5247984 1722 5248584

gone 0
6000464 5286012 5290132 -1 5255296 2478 5255808

6000464 5286012 5290132 -1 5255296 2478 5280340

6000464 5286012 5290132 -1 5255296 2478 5280264

6000464 5286012 5290132 -1 5255296 2478 1

6000464 5286012 5290132 -1 5255296 2478 5269640

6000464 5286012 5290132 -1 5255296 2478 0

6000464 5286012 5290132 -1 5255296 2478 5269640

6000464 5286012 5290132 -1 5255296 2478 0

(forever)
kripken commented 12 years ago

It would be faster to debug this if the generated code ran faster. So I tried to optimize it a little, the eliminator optimization step hangs on __Z20construct_core_typesR11simple_listI16input_type_entryE however. This is a 75,000 line function(!) whose code seems to repeat many many times. Perhaps tons of expanded macros? Or has something gone wrong with the compilation?

ziz commented 12 years ago

Lots and lots of expanded macros - see src/emu/inpttype.h line 52-ish. Starts with expanding INPUT_PORT_DIGITAL_TYPE a bunch, which doesn't look (terribly) complicated, but seems to be generating calls that end up pretty antagonistic to emscripten for some reason.

kripken commented 12 years ago

Those calls are antagonistic to everything I think ;)

So, today I worked on using the libcxxabi code to handle dynamic cast, so we can be sure all cases are handled right. Was harder to do than I thought, but I think I have it working on the incoming branch in emscripten. On that branch, building gives a js file that fails in node due to "too many variables" (!). In SpiderMonkey it starts to run, and does not hang on dynamic cast, so that part seems fixed. What I get now is

$js -m -n messtiny.js -listmedia
 SYSTEM      MEDIA NAME (brief)   IMAGE FILE EXTENSIONS SUPPORTED     
----------  --------------------  ------------------------------------
Device z80 specifies a 0 context size!

That is when built with -s EXCEPTION_DEBUG=0, otherwise there is a lot of output about the exception being thrown and then caught. Looks like that error message is a thrown exception, we need to look in the MESS code to find out what is going wrong there. Perhaps it is trying to read a file for that information? (If so we need to set up the emulated filesystem for it.)

DopefishJustin commented 12 years ago

That error comes from src/emu/devcpu.c at the end of legacy_cpu_device::legacy_cpu_device(). The call to legacy_cpu_device::get_legacy_int() returns 0 and it bails. I notice that function uses const_cast - might const_cast have a similar issue to dynamic_cast, and benefit from replacement with libcxxabi?

kripken commented 12 years ago

I don't have the compiled code on this machine so I can't check, but I am quite sure that const_cast is purely compile-time. It just removes a const attribute from a pointer (making a const char* into a char* for example). So I think the problem is elsewhere.

DopefishJustin commented 12 years ago

Ah right you are. So I think what's supposed to happen here is it should reach src/emu/cpu/z80/z80.c line 3852:

case CPUINFO_INT_CONTEXT_SIZE: info->i = sizeof(z80_state); break;

z80_state being a struct with a bunch of stuff in it defined at the top of the file. That info->i value is then what should be returned where we're apparently getting 0. Shouldn't be too hard to throw in some printf's and see where things are going haywire, I need to get my Emscripten setup working again though.

DopefishJustin commented 12 years ago

So I threw in some printf's and legacy_cpu_device::get_legacy_int() gets a correct-looking value from the z80's CPU_GET_INFO function, so all of the const_cast and union stuff seems to work fine. That value should then be returned and assigned to tokenbytes, but for some reason tokenbytes is 0 after the assignment. So somehow a basic return from function and variable assignment are not working right.

I tried to make a reduced testcase doing all the same things but it works fine, tokenbytes gets the correct value. So there is some kind of screwy bug in emscripten I think but I am not sure how to narrow it down.

Attempted testcase: https://gist.github.com/1918971

kripken commented 12 years ago

We will need to debug the .js I guess. If you can, find the return statement in the generated .js and see where it is received. Returns are normal JS returns, so they should be easy to follow. Then see what goes on with the variable assignment.

Or, tell me which are the relevant functions in the .ll and .js and which is the relevant return, and which is the relevant variable being set to that doesn't work.

DopefishJustin commented 12 years ago

OK, the problem is in the generated JavaScript in the following function:

function __ZN17legacy_cpu_deviceC2ERK14machine_configPFP8device_tS2_PKcS4_jES6_S4_jPFvPS_jP7cpuinfoE($this, $mconfig, $type, $tag, $owner, $clock, $get_info)

Look for the last call to get_legacy_int - I have added some print() calls:

    case 40: // $232
      var $233=$16;
      var $234=(($233+356)|0);
      var $235=$234;
      var $236=(($235+5524)|0);
      var $237=$236;
      for (var $$dest = $237, $$stop = $$dest + 12; $$dest < $$stop; $$dest++) {
        HEAP8[$$dest] = 0
      };
      print("about to call get_legacy_int");
      var $238=(function() { try { __THREW__ = false; return __ZNK17legacy_cpu_device14get_legacy_intEj($16, 16384) } catch(e) { if (typeof e != "number") throw e; if (ABORT) throw e; __THREW__ = true; print("Exception: " + e + ", currently at: " + (new Error().stack)); return null } })(); if (!__THREW__) { __label__ = 41; break; } else { __label__ = 27; break; };
      print("does this ever execute?");
      var $238$0=$238[0];
      var $238$1=$238[1];
    case 41: // $239
      print("we have moved on to the next case");
      var $240$0=$238$0;
      var $240=$240$0;
      $tokenbytes=$240;
      var $241=$tokenbytes;
      var $242=(($241)|0)==0;
      if ($242) { __label__ = 42; break; } else { __label__ = 45; break; }

At runtime this shows the following:

 SYSTEM      MEDIA NAME (brief)   IMAGE FILE EXTENSIONS SUPPORTED     
----------  --------------------  ------------------------------------
about to call get_legacy_int
we have moved on to the next case
Compiled code throwing an exception, 6007712,5248320,16, at ___cxa_throw(6007712,5248320,16)@messtiny.js:868761

The "does this ever execute" line is never reached because the function call code above it always does a break;, so the statements that populate $238$0 and $238$1 are never reached and thus nothing gets into $tokenbytes.

DopefishJustin commented 12 years ago

http://interbutt.com/temp/messtiny.zip

.bc and resulting .js

kripken commented 12 years ago

Thanks for the files, and the investigation from the previous comment. Yes, exactly as you said code is not being executed that needs to be. Specifically it looks like we don't legalize phis that are reached from invokes properly. I'll work on this now.

kripken commented 12 years ago

Likely fix is in incoming branch, running automatic tests on it now.

kripken commented 12 years ago

Tests passed, merged to master.

DopefishJustin commented 12 years ago

Thanks for the prompt fix, with latest master I get this error running js messtiny.js -listmedia though:

messtiny.js:2316: ReferenceError: $$emscripten$temp$0 is not defined
kripken commented 12 years ago

Can you link to the generated code?

DopefishJustin commented 12 years ago

http://interbutt.com/temp/messtiny2.zip

I won't have time to look at it in detail until tonight.

kripken commented 12 years ago

Thanks, I see the problem. Will have a fix shortly.

kripken commented 12 years ago

Fix is in incoming branch of Emscripten.

kripken commented 12 years ago

Sorry for these bugs btw, they are all to do with exception handling and 64-bit values. I guess the combination of those two is not represented in the emscripten test suite well enough.

DopefishJustin commented 12 years ago

-listmedia looks good now, will have to try and put MESS through more of its paces :)

kripken commented 12 years ago

Great! :)

DopefishJustin commented 12 years ago

So I have been trying to get MESS to load files, without success. I'm now testing with romcmp which is a small file comparison utility packaged with MESS using the same core file I/O routines - to build it, add the parameter tools to the make command. I have put the .bc and the .js produced by emscripten master at http://interbutt.com/temp/romcmp.zip

I added the following after {{PRE_RUN_ADDITIONS}} in romcmp.js:

FS.createPath('/', 'roms', true, false);
FS.createLazyFile('/roms', 'coleco.zip', 'roms/coleco.zip', true, false);
FS.createLazyFile('/roms', 'colecoa.zip', 'roms/colecoa.zip', true, false);

coleco.zip and colecoa.zip are in a roms/ subfolder of the jsmess directory where I am running the .js from.

Running js romcmp.js /roms/coleco.zip /roms/colecoa.zip gives the following:

Warning: Enlarging memory arrays, this is not fast! 14300728,10485760
Error, cannot open zip file '/roms/coleco.zip' !

Debugging the code, the error is that the file does not exist.

Running the same command with node trips an assertion which may be of interest:

Warning: Enlarging memory arrays, this is not fast! 14300728,10485760
Assertion failed: Character code 65533 (�)  at offset 11 not in 0x00-0xFF.:
Error
    at abort (/home/jkerk/jsmess/romcmp.js:359:25)
    at assert (/home/jkerk/jsmess/romcmp.js:366:5)
    at intArrayFromString (/home/jkerk/jsmess/romcmp.js:739:9)
    at Object.forceLoadFile (/home/jkerk/jsmess/romcmp.js:20879:28)
    at _open (/home/jkerk/jsmess/romcmp.js:21639:19)
    at _fopen (/home/jkerk/jsmess/romcmp.js:21730:17)
    at _osd_open (/home/jkerk/jsmess/romcmp.js:13272:15)
    at __Z13zip_file_openPKcPP9_zip_file (/home/jkerk/jsmess/romcmp.js:4114:15)
    at __ZL10load_filesiPiPKc (/home/jkerk/jsmess/romcmp.js:1913:16)
    at _main (/home/jkerk/jsmess/romcmp.js:893:15)
Error, cannot open zip file '/roms/coleco.zip' !
DopefishJustin commented 12 years ago

It's entirely possible that there is user error at work here too :)

DopefishJustin commented 12 years ago

OK it actually does work in Firefox as an html file so it is just something about the command line I guess.

kripken commented 12 years ago

There are some odd differences in how the node/d8/js shells handle files, it is definitely annoying. See the --embed-file option in emcc, for small files it helps work around this (by embedding the file in the .js).

Web browsers however should work fine, file input is standardized there. If you find a problem in a browser please let me know.

DopefishJustin commented 12 years ago

--embed-file works well, I am able to embed coleco.zip and then start the emulation using js messtiny.js coleco -str 1 -rompath . -verbose. Unfortunately the emulation hangs while starting the Z80 device, in device_execute_interface::interface_clock_changed() in src/emu/diexec.c. This function does some int64 math which I know is not completely supported by emscripten yet.

Here is a reduced testcase for the function in question: https://gist.github.com/1976973

With gcc or clang, it outputs the following:

m_divshift is 1
m_divshift is 2
m_divshift is 3
m_divshift is 4
m_divshift is 5
m_divshift is 6
m_divshift is 7
m_divshift is 8
m_divisor is 1091269979

1091269979 is the correct answer reached by native MESS when initializing the Z80. When compiled with emscripten, it loops infinitely, m_divshift increments all the way up to 255 and then wraps back to 0 again.

kripken commented 12 years ago

Thanks, that's a bug in i64 comparisons. I think I fixed it on incoming now but needs more testing.