emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.81k stars 3.31k forks source link

Build MESS/MAME #131

Closed ziz closed 12 years ago

ziz commented 12 years ago

dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."

The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).

(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)

DopefishJustin commented 12 years ago

So now we hang in validate_inlines() in src/emu/validate.c, which does a whole bunch of sanity tests on math operations with different data types. (yay...)

Here is a reduced testcase of the failing code: https://gist.github.com/1983766

In gcc or clang the following output is produced right away:

testu64a is 7373125414476351500

Emscripten-produced .js hangs indefinitely.

Interestingly, with emscripten revision ad9ce81becf51181bd367ab19a812dbf0de6c899 (just before the i64 fix) it does not hang, but outputs a different number:

testu64a is 14746250828952703000

That would explain why we did not hit this before, since this is earlier in the startup process than the previous hang.

kripken commented 12 years ago

Thanks for yet another clear testcase! This should also be fixed on incoming. Note though that as before, I haven't fully tested it yet (I just wanted to push it out so this doesn't block you).

DopefishJustin commented 12 years ago

Excellent! It does indeed appear to be fixed.

So now I'd like to try working with MESS compiled with OSD=sdl instead of OSD=mini, however I get the following error when I try to emscripten it:

[jkerk@myhost jsmess]$ ~/emscripten/emcc messtiny.bc -o messtiny_sdl.js
emcc: warning: using libcxx turns on CORRECT_* options
emcc: warning: using libcxxabi, this may need CORRECT_* options

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
Invalid token, cannot triage: // {
//   "tokens": [
//     {
//       "text": "fence"
//     },
//     {
//       "text": "seq_cst"
//     }
//   ],
//   "indent": 2,
//   "lineNum": 100063,
//   "__uid__": 852
// }

.bc is at http://interbutt.com/temp/messtiny_sdl.zip

kripken commented 12 years ago

That's a synchronization instruction that emscripten can ignore. Fixed in incoming branch.

Now if chokes on some inline assembly that needs to be removed, in _Z15sdl_read_socketP9_osd_filePvyjPj.

DopefishJustin commented 12 years ago

I've yanked out the asm and TTF library stuff and I get a .js that seems to work, but needs stubs for the following functions:

_SDL_Linked_Version
_SDL_GetWMInfo
_SDL_VideoDriverName
_SDL_ThreadID
_SDL_NumJoysticks
_SDL_AudioDriverName
_SDL_EnableUNICODE

_pthread_cond_init
_pthread_cond_signal
_pthread_cond_destroy

They don't have to do anything, function foo() {} suffices. (Why does emscripten not generate that instead of var foo;?)

I also need to comment out the "Cannot return obtained SDL audio params" assert in _SDL_OpenAudio (I guess that may be relevant later when hooking up audio).

And when running with js from the command line (haven't tried anything else yet) I get some errors about the emscripten-supplied Javascript functions:

ReferenceError: addEventListener is not defined

in:

function _SDL_Init(what) {
      SDL.startTime = Date.now();
      ['keydown', 'keyup', 'keypress'].forEach(function(event) {
        addEventListener(event, SDL.receiveEvent, true);
      });

TypeError: Module.canvas is undefined in:

  function _SDL_SetVideoMode(width, height, depth, flags) {
      Module['canvas'].width = width;
      Module['canvas'].height = height;
      return SDL.screen = SDL.makeSurface(width, height, flags);
    }

and TypeError: Module.print is not a function in:

      },createContext:function (useWebGL) {
...
        } catch (e) {
          Module.print('(canvas not available)');
          return null;
        }
kripken commented 12 years ago

Emscripten doesn't generate empty stubs because then you would get silent failures that are very hard to debug. Instead, you get clear failures about what is missing.

In those examples, I'm not sure it's ok if they do nothing. They might need to return 0 or -1 or something like that. An empty stub returns undefined, if they do math on that, it can go very wrong.

The SDL errors are basically that our SDL implementation assumes its running in a browser, so event listening works, and there is a canvas as set up in src/shell.html. You get that automatically if you run emcc and tell it to generate html output.

DopefishJustin commented 12 years ago

At least for me, having an auto-stub with print('foo unimplemented!'); or the like inside would be a lot more convenient for development, while still addressing the silent failure issue. I guess it depends somewhat on the project since in our case it's all superfluous stuff, whereas on another project it might go completely off the rails if key functions don't do anything. May be worth having an option for.

I did figure that I need a browser for the SDL stuff, it's a bit annoying because there are tests I can do from the command line whereas browsers have trouble with the size of the code, but I guess it makes sense. More robust checks would be nice :)

DopefishJustin commented 12 years ago

So something has changed in emscripten just recently such that video no longer appears on the canvas (stays white forever) even if the code is for sure running. Trying to track down what exactly but there have been a lot of SDL changes....

kripken commented 12 years ago

My first guess is we now support rgba and not just rgb. To test this, find the 4 short functions with CSSRGB in their name, and replace the rgba( to rgb( and remove the last parameter generated (the browser will break if rgb gets 4 params). (You don't need to compile to test this.) Maybe the alpha is reversed, that could explain no rendering.

If it isn't that, there is SDL.debugSurface which can be used on SDL surfaces to see their source and contents. Also useful without compiling.

Otherwise, bisection seems the best route (I mean, binary search to find the changeset that introduces the bug).

DopefishJustin commented 12 years ago

I did guess the rgba thing but as far as I can tell the CSSRGB functions were not called anywhere. I'm trying to bisect now but it will take a while.

kripken commented 12 years ago

Hmm. Another guess might be to find if MESS uses an SDL command to render that is not covered by the emscripten test suite (because what is tested, should not regress). If you can get a list of the relevant SDL commands MESS uses I can match them against the test suite.

But, bisection is a guaranteed result in logarithmic time, so that's usually best...

kripken commented 12 years ago

Another suspect is the code in SDL_LockSurface and SDL_UnlockSurface. Those are the critical paths for getting pixels into the canvas representing the screen. Checking that these are in fact called, that they get non-0 pixels, and that they putImage those to the right canvas, might be helpful.

DopefishJustin commented 12 years ago

6a3c3938cd802db8b034c27b743844735f1b6d28 is the first bad commit commit 6a3c3938cd802db8b034c27b743844735f1b6d28 Author: Alon Zakai alonzakai@gmail.com Date: Fri Mar 23 12:02:59 2012 -0700

minimal support for SDL text rendering
kripken commented 12 years ago

Ok, there are two relevant changes,

  1. Some code was moved behind if (surf == SDL.screen) {
  2. Alpha is now copied, data[dst+3] = (val >> 24) & 0xff, instead of just = 0xff

Can you check which is the problem here? (no need to compile in either)

My bet is the second, we probably need to do = 0xff there if the target is SDL.screen, that is to ignore alpha when rendering the final RGB data. I guess SDL semantics mean we need to ignore alpha there..?

edit: forgot to say, thanks for bisecting this!

kripken commented 12 years ago

Meanwhile I did a comparison of SDL's native behavior. Looks like it does in fact ignore alpha when rendering to the screen, so I am fixing that. Hopefully that's enough to resolve this.

Also I noticed we had R and B flipped, so I fixed that too. However I suspect that my seeing that natively with SDL might depend on the graphics hardware I have, not sure. If you see odd colors after this patch let me know.

DopefishJustin commented 12 years ago

K the video shows up now but the colours are bad, blue instead of red.

kripken commented 12 years ago

Ok, I'll revert the B/G flip then. There must be some setting that controls that, will investigate later.

DopefishJustin commented 12 years ago

All better now, thanks.

DopefishJustin commented 12 years ago

So commit 6b2a2b6c8a3f416660be36a846a45cf635af2a5d makes keyboard input stop working (ironically). Luckily, there is a simple program testkeys.c included with the project that polls SDL for keys and prints the results to stdout, which made this easy to pinpoint. I have posted the .bc and .c:

http://interbutt.com/temp/testkeys.bc https://gist.github.com/2335842

You need to paste the code for _emscripten_set_loop() into the js output if testing the old revisions.

I guess 1.2 vs. 1.3+ API differences are probably the issue here but it was picking up keys fine prior to that commit.

kripken commented 12 years ago

Yeah, 1.2 vs. 1.3 issue. In 1.3/2.0, scancodes are not the same as keycodes. We were missing a scancode translation table in the emscripten sdl impl. I added one now in incoming.

Note that I didn't include all the keys, because i can't find a good lookup table for DOM codes, so it means finding them out manually which is tedious. So let me know if something is missing.

DopefishJustin commented 12 years ago

OK that does work if I recompile with the SDL 1.3 headers, but it doesn't work with the 1.2 compile (which did work before). Not sure if that's intended.

kripken commented 12 years ago

Yes, we ship the 2.0 headers in emscripten and support only those. It's hard to support both. We used to support 1.2 by mistake since the code was not updated to 2.0, which meant 2.0 was broken, so fixing that broke 1.2 which is intentional.

DopefishJustin commented 12 years ago

SDLMESS when compiled with SDL 1.3+ tries to include <pty.h> which is not provided by emscripten. Can probably be disabled but FYI.

kripken commented 12 years ago

Ok, we need pty.h in emscripten then. I added a handwritten version to incoming now, let me know if it works.

DopefishJustin commented 12 years ago

Yep that works, thanks.

DopefishJustin commented 12 years ago

Wrong colours are back, reverting bc2e57394e0c90addb8eaf6074159046511d7a69 fixes it.

kripken commented 12 years ago

That patch fixes the color masks, previously we had them wrong. So it isn't obvious to me what is going on here. Can you make a testcase I can debug? Then we can also add the testcase to the test suite to prevent future regressions.

DopefishJustin commented 12 years ago

So with current emscripten the compiled js bombs out in Chrome with the following error in the error console:

Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'createBuffer'

I guess it is trying to init something GL-related even though WebGL is not enabled; I guess emscripten should detect that more gracefully? The actual video output is not going through GL yet.

kripken commented 12 years ago

What function is being called, and from where? If a GL function is called, that sounds like a bug in the C++ code. Unless it only happens in Chrome and not Firefox, in which case I would suspect a browser bug. But this might become clearer if you show me the relevant code.

DopefishJustin commented 12 years ago

Whoops the markup ate part of that which might make it clearer.

DopefishJustin commented 12 years ago

So the emscripten-generated code contains this:

createContext:function (canvas, useWebGL, setInModule) { try { var ctx = canvas.getContext(useWebGL ? 'experimental-webgl' : '2d');

And then later on there is code like this in GLEmulation.init():

this.vertexObject = Module.ctx.createBuffer();

Which apparently does not exist if the canvas is set up as 2D.

useWebGL is set in makeSurface:

// Decide if we want to use WebGL or not var useWebGL = (flags & 0x04000000) != 0; // SDL_OPENGL

So I am assuming what is happening is useWebGL is coming out false for whatever reason and then the C++ is calling GL's init() somewhere, even though the OpenGL output option has not been enabled on the MESS command line. I haven't actually debugged it though.

I agree that ideally the C++ should not be calling GL stuff if the surface isn't set up for it (if that is indeed the problem), but this is also a pretty lame way to fail. If the createBuffer() method is not going to always exist then there should be some kind of type check or exception handling for the case where it doesn't, with an appropriate error message, and ideally allowing execution to continue (with, obviously, no GL output).

kripken commented 12 years ago

Hmm, the question is why GLEmulation is included in the first place. There are a few functions which depend on it,

glVertexPointer
glMatrixMode
SDL_GL_GetProcAddress

Is one of those used in MESS? It seems like they should only appear in a build that uses GL. But perhaps they are included but not used, and we need to add a workaround for this?

DopefishJustin commented 12 years ago

Yep all three of those are in MESS. Whether MESS uses GL or not is not a build-time option, just a command-line parameter:

mess -video soft (the default)

vs.

mess -video opengl

kripken commented 12 years ago

But you are building without opengl as a build-time option, I assume? then why do those commands end up in the output binary?

DopefishJustin commented 12 years ago

You assume wrongly. However now that I check there is such an option which may be enough to get by for now. But surely MESS is not the only software on earth with run-time video output selection?

kripken commented 12 years ago

Not the only one, but the first to be compiled with emscripten I guess ;) ok, then we need runtime checks for this, I will add that.

DopefishJustin commented 12 years ago

Thanks.

Just tried with emscripten incoming and now I can't link - I see emld was removed so I replaced it with emcc in the makefile but now this happens:

/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/build/file2str.o obj/sdl/messtiny/libocore.a -lmsdl-config --libs`pkg-config --libs fontconfig -lSDL_ttf -lutil -o obj/sdl/messtiny/build/file2str JAVA not defined in ~/.emscripten, using "java" Traceback (most recent call last): File "/home/jkerk/emscripten/emcc", line 551, in assert '=' in newargs[i+1], 'Incorrect syntax for -s (use -s OPT=VAL): ' + newargs[i+1] AssertionError: Incorrect syntax for -s (use -s OPT=VAL): obj/sdl/messtiny/build/file2str.o`

kripken commented 12 years ago

The first issue should be fixed in incoming.

kripken commented 12 years ago

The second should be fixed on incoming as well (it was fallout from LLVM deprecating llvm-ld).

DopefishJustin commented 12 years ago

When I try to compile now the resulting .js is missing important functions like _main() and __ZN7astring4initEv(). This is before closure has been run on it.

.bc: http://interbutt.com/temp/messtiny-20120622.bc.zip

Command line: emcc messtiny.bc -o messtiny.js --post-js post.js --embed-file roms/coleco.zip --embed-file cosmofighter2.zip

post.js: https://raw.github.com/ziz/jsmess/no_cothreads/post.js

Maybe they are missing from the .bc file somehow but I don't know how to check.

kripken commented 12 years ago

LLVM's llvm-nm tool can tell you what symbols are in a .bc file. Is main there?

DopefishJustin commented 12 years ago

It's not there. It is in obj/sdl/messtiny/osd/sdl/sdlmain.o and also obj/sdl/messtiny/libosd.a but disappears after the final link:

/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/version.o obj/sdl/messtiny/drivlist.o obj/sdl/messtiny/emu/drivers/emudummy.o obj/sdl/messtiny/mess/drivers/coleco.o obj/sdl/messtiny/mess/machine/coleco.o obj/sdl/messtiny/libosd.a obj/sdl/messtiny/libcpu.a obj/sdl/messtiny/libemu.a obj/sdl/messtiny/libdasm.a obj/sdl/messtiny/libsound.a obj/sdl/messtiny/libutil.a obj/sdl/messtiny/libexpat.a obj/sdl/messtiny/libsoftfloat.a obj/sdl/messtiny/libformats.a obj/sdl/messtiny/libz.a obj/sdl/messtiny/libocore.a -lmsdl-config --libs`pkg-config --libs fontconfig -lSDL_ttf -lutil -o messtiny`

I am getting a lot of /usr/bin/llvm-dis: Invalid bitcode signature at link time which may be related (llvm-dis likes sdlmain.o but not libosd.a, but llvm-nm works on both).

I'm using llvm 3.1.

kripken commented 12 years ago

Perhaps LLVM 3.1 changed linking semantics somehow, very odd. Can you send me the relevant bitcode files (before the link that removes main) so I can try to reproduce myself? (please try to narrow it down as much as possible)

DopefishJustin commented 12 years ago

Here's a smaller example with the testkeys utility: http://interbutt.com/temp/testkeys-linkerror.zip

Running the command line in maketestkeys results in an output file which is missing _Z15utf8_from_ucharPcjj(), which appears to be present in libutil.a. (Compiling the output to a .html with emcc, then opening it in a browser and pressing a key trips the missing function.)

kripken commented 12 years ago

The problem is the attempt to link in native x86 binaries. You need to remove those, they can't be compiled into JS. Normally this is not too much of a problem, we detect them and ignore them - it just makes your builds slower. But in this case here, you have

emcc  -Wl,--warn-common -s testkeys.o libutil.a libocore.a -lm `sdl-config --libs` `pkg-config --libs fontconfig` -lSDL_ttf -lutil -o testkeys

Note there is both libutil.a - a bitcode file - and -lutil - a request for a system library to be linked. This is what mixes up the compiler.

Replace that line with

emcc  -Wl,--warn-common -s testkeys.o libutil.a libocore.a -o testkeys

that is, remove all requests for system libraries - and it will work.

kripken commented 12 years ago

With that said, emcc should still not get confused even though there are irrelevant x86 libraries. I pushed a possible fix for that, it might help here. But I still recommend removing native libraries, even if they do not make the build fail they make it slower.

DopefishJustin commented 12 years ago

Thanks, that fixes the testkeys example and restores __ZN7astring4initEv() to MESS, _main() is still missing though so some more poking is in order.

DopefishJustin commented 12 years ago

Well my poking time has been pretty limited so here is something reproducible anyway:

http://interbutt.com/temp/mess-linkerror.zip

$ llvm-nm libosd.a | grep main

         T main
         d _ZL13main_threadid

$ emcc -s version.o drivlist.o emudummy.o coleco_driver.o coleco_machine.o libosd.a libcpu.a libemu.a libdasm.a libsound.a libutil.a libexpat.a libsoftfloat.a libformats.a libz.a libocore.a -o messtiny.bc

$ llvm-nm messtiny.bc | grep main

         T _Z16jsmess_main_loopv
         T _Z20jsmess_set_main_loopR16device_scheduler
         d _ZL13mnemonic_main
         t _ZL18menu_main_populateR15running_machineP7ui_menuPv
         t _ZL9menu_mainR15running_machineP7ui_menuPvS3_
         T _ZNK24device_execute_interface16cycles_remainingEv
         T _ZNK9emu_timer9remainingEv
         U emscripten_set_main_loop

No main(). Actually lots(all?) stuff from libosd.a appears not to make it in (e.g. _Z13sdlinput_initR15running_machine()) so maybe something is going wrong with that file.

kripken commented 12 years ago

Ok, looks like what happens here is that main() is in an archive file and not a normal object. We were only looking for explicit undefined symbols in archives so we missed this. This is fixed in incoming. With this fix, I see main as well as _Z13sdlinput_initR15running_machine etc.

DopefishJustin commented 12 years ago

Link is good now (thanks!)

Getting this when I try to run it in Chrome:

Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'getExtension'