Closed ziz closed 12 years ago
So now we hang in validate_inlines()
in src/emu/validate.c, which does a whole bunch of sanity tests on math operations with different data types. (yay...)
Here is a reduced testcase of the failing code: https://gist.github.com/1983766
In gcc or clang the following output is produced right away:
testu64a is 7373125414476351500
Emscripten-produced .js hangs indefinitely.
Interestingly, with emscripten revision ad9ce81becf51181bd367ab19a812dbf0de6c899 (just before the i64 fix) it does not hang, but outputs a different number:
testu64a is 14746250828952703000
That would explain why we did not hit this before, since this is earlier in the startup process than the previous hang.
Thanks for yet another clear testcase! This should also be fixed on incoming. Note though that as before, I haven't fully tested it yet (I just wanted to push it out so this doesn't block you).
Excellent! It does indeed appear to be fixed.
So now I'd like to try working with MESS compiled with OSD=sdl instead of OSD=mini, however I get the following error when I try to emscripten it:
[jkerk@myhost jsmess]$ ~/emscripten/emcc messtiny.bc -o messtiny_sdl.js
emcc: warning: using libcxx turns on CORRECT_* options
emcc: warning: using libcxxabi, this may need CORRECT_* options
node.js:201
throw e; // process.nextTick error, or 'error' event on first tick
^
Invalid token, cannot triage: // {
// "tokens": [
// {
// "text": "fence"
// },
// {
// "text": "seq_cst"
// }
// ],
// "indent": 2,
// "lineNum": 100063,
// "__uid__": 852
// }
That's a synchronization instruction that emscripten can ignore. Fixed in incoming branch.
Now if chokes on some inline assembly that needs to be removed, in _Z15sdl_read_socketP9_osd_filePvyjPj
.
I've yanked out the asm and TTF library stuff and I get a .js that seems to work, but needs stubs for the following functions:
_SDL_Linked_Version
_SDL_GetWMInfo
_SDL_VideoDriverName
_SDL_ThreadID
_SDL_NumJoysticks
_SDL_AudioDriverName
_SDL_EnableUNICODE
_pthread_cond_init
_pthread_cond_signal
_pthread_cond_destroy
They don't have to do anything, function foo() {}
suffices. (Why does emscripten not generate that instead of var foo;
?)
I also need to comment out the "Cannot return obtained SDL audio params" assert in _SDL_OpenAudio
(I guess that may be relevant later when hooking up audio).
And when running with js
from the command line (haven't tried anything else yet) I get some errors about the emscripten-supplied Javascript functions:
ReferenceError: addEventListener is not defined
in:
function _SDL_Init(what) {
SDL.startTime = Date.now();
['keydown', 'keyup', 'keypress'].forEach(function(event) {
addEventListener(event, SDL.receiveEvent, true);
});
TypeError: Module.canvas is undefined
in:
function _SDL_SetVideoMode(width, height, depth, flags) {
Module['canvas'].width = width;
Module['canvas'].height = height;
return SDL.screen = SDL.makeSurface(width, height, flags);
}
and
TypeError: Module.print is not a function
in:
},createContext:function (useWebGL) {
...
} catch (e) {
Module.print('(canvas not available)');
return null;
}
Emscripten doesn't generate empty stubs because then you would get silent failures that are very hard to debug. Instead, you get clear failures about what is missing.
In those examples, I'm not sure it's ok if they do nothing. They might need to return 0 or -1 or something like that. An empty stub returns undefined, if they do math on that, it can go very wrong.
The SDL errors are basically that our SDL implementation assumes its running in a browser, so event listening works, and there is a canvas as set up in src/shell.html. You get that automatically if you run emcc and tell it to generate html output.
At least for me, having an auto-stub with print('foo unimplemented!');
or the like inside would be a lot more convenient for development, while still addressing the silent failure issue. I guess it depends somewhat on the project since in our case it's all superfluous stuff, whereas on another project it might go completely off the rails if key functions don't do anything. May be worth having an option for.
I did figure that I need a browser for the SDL stuff, it's a bit annoying because there are tests I can do from the command line whereas browsers have trouble with the size of the code, but I guess it makes sense. More robust checks would be nice :)
So something has changed in emscripten just recently such that video no longer appears on the canvas (stays white forever) even if the code is for sure running. Trying to track down what exactly but there have been a lot of SDL changes....
My first guess is we now support rgba and not just rgb. To test this, find the 4 short functions with CSSRGB in their name, and replace the rgba(
to rgb(
and remove the last parameter generated (the browser will break if rgb gets 4 params). (You don't need to compile to test this.) Maybe the alpha is reversed, that could explain no rendering.
If it isn't that, there is SDL.debugSurface which can be used on SDL surfaces to see their source and contents. Also useful without compiling.
Otherwise, bisection seems the best route (I mean, binary search to find the changeset that introduces the bug).
I did guess the rgba thing but as far as I can tell the CSSRGB functions were not called anywhere. I'm trying to bisect now but it will take a while.
Hmm. Another guess might be to find if MESS uses an SDL command to render that is not covered by the emscripten test suite (because what is tested, should not regress). If you can get a list of the relevant SDL commands MESS uses I can match them against the test suite.
But, bisection is a guaranteed result in logarithmic time, so that's usually best...
Another suspect is the code in SDL_LockSurface and SDL_UnlockSurface. Those are the critical paths for getting pixels into the canvas representing the screen. Checking that these are in fact called, that they get non-0 pixels, and that they putImage those to the right canvas, might be helpful.
6a3c3938cd802db8b034c27b743844735f1b6d28 is the first bad commit commit 6a3c3938cd802db8b034c27b743844735f1b6d28 Author: Alon Zakai alonzakai@gmail.com Date: Fri Mar 23 12:02:59 2012 -0700
minimal support for SDL text rendering
Ok, there are two relevant changes,
if (surf == SDL.screen) {
data[dst+3] = (val >> 24) & 0xff
, instead of just = 0xff
Can you check which is the problem here? (no need to compile in either)
My bet is the second, we probably need to do = 0xff
there if the target is SDL.screen
, that is to ignore alpha when rendering the final RGB data. I guess SDL semantics mean we need to ignore alpha there..?
edit: forgot to say, thanks for bisecting this!
Meanwhile I did a comparison of SDL's native behavior. Looks like it does in fact ignore alpha when rendering to the screen, so I am fixing that. Hopefully that's enough to resolve this.
Also I noticed we had R and B flipped, so I fixed that too. However I suspect that my seeing that natively with SDL might depend on the graphics hardware I have, not sure. If you see odd colors after this patch let me know.
K the video shows up now but the colours are bad, blue instead of red.
Ok, I'll revert the B/G flip then. There must be some setting that controls that, will investigate later.
All better now, thanks.
So commit 6b2a2b6c8a3f416660be36a846a45cf635af2a5d makes keyboard input stop working (ironically). Luckily, there is a simple program testkeys.c included with the project that polls SDL for keys and prints the results to stdout, which made this easy to pinpoint. I have posted the .bc and .c:
http://interbutt.com/temp/testkeys.bc https://gist.github.com/2335842
You need to paste the code for _emscripten_set_loop()
into the js output if testing the old revisions.
I guess 1.2 vs. 1.3+ API differences are probably the issue here but it was picking up keys fine prior to that commit.
Yeah, 1.2 vs. 1.3 issue. In 1.3/2.0, scancodes are not the same as keycodes. We were missing a scancode translation table in the emscripten sdl impl. I added one now in incoming.
Note that I didn't include all the keys, because i can't find a good lookup table for DOM codes, so it means finding them out manually which is tedious. So let me know if something is missing.
OK that does work if I recompile with the SDL 1.3 headers, but it doesn't work with the 1.2 compile (which did work before). Not sure if that's intended.
Yes, we ship the 2.0 headers in emscripten and support only those. It's hard to support both. We used to support 1.2 by mistake since the code was not updated to 2.0, which meant 2.0 was broken, so fixing that broke 1.2 which is intentional.
SDLMESS when compiled with SDL 1.3+ tries to include <pty.h>
which is not provided by emscripten. Can probably be disabled but FYI.
Ok, we need pty.h in emscripten then. I added a handwritten version to incoming now, let me know if it works.
Yep that works, thanks.
Wrong colours are back, reverting bc2e57394e0c90addb8eaf6074159046511d7a69 fixes it.
That patch fixes the color masks, previously we had them wrong. So it isn't obvious to me what is going on here. Can you make a testcase I can debug? Then we can also add the testcase to the test suite to prevent future regressions.
So with current emscripten the compiled js bombs out in Chrome with the following error in the error console:
Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'createBuffer'
I guess it is trying to init something GL-related even though WebGL is not enabled; I guess emscripten should detect that more gracefully? The actual video output is not going through GL yet.
What function is being called, and from where? If a GL function is called, that sounds like a bug in the C++ code. Unless it only happens in Chrome and not Firefox, in which case I would suspect a browser bug. But this might become clearer if you show me the relevant code.
Whoops the markup ate part of that which might make it clearer.
So the emscripten-generated code contains this:
createContext:function (canvas, useWebGL, setInModule) { try { var ctx = canvas.getContext(useWebGL ? 'experimental-webgl' : '2d');
And then later on there is code like this in GLEmulation.init():
this.vertexObject = Module.ctx.createBuffer();
Which apparently does not exist if the canvas is set up as 2D.
useWebGL is set in makeSurface:
// Decide if we want to use WebGL or not var useWebGL = (flags & 0x04000000) != 0; // SDL_OPENGL
So I am assuming what is happening is useWebGL is coming out false for whatever reason and then the C++ is calling GL's init() somewhere, even though the OpenGL output option has not been enabled on the MESS command line. I haven't actually debugged it though.
I agree that ideally the C++ should not be calling GL stuff if the surface isn't set up for it (if that is indeed the problem), but this is also a pretty lame way to fail. If the createBuffer() method is not going to always exist then there should be some kind of type check or exception handling for the case where it doesn't, with an appropriate error message, and ideally allowing execution to continue (with, obviously, no GL output).
Hmm, the question is why GLEmulation is included in the first place. There are a few functions which depend on it,
glVertexPointer
glMatrixMode
SDL_GL_GetProcAddress
Is one of those used in MESS? It seems like they should only appear in a build that uses GL. But perhaps they are included but not used, and we need to add a workaround for this?
Yep all three of those are in MESS. Whether MESS uses GL or not is not a build-time option, just a command-line parameter:
mess -video soft
(the default)
vs.
mess -video opengl
But you are building without opengl as a build-time option, I assume? then why do those commands end up in the output binary?
You assume wrongly. However now that I check there is such an option which may be enough to get by for now. But surely MESS is not the only software on earth with run-time video output selection?
Not the only one, but the first to be compiled with emscripten I guess ;) ok, then we need runtime checks for this, I will add that.
Thanks.
Just tried with emscripten incoming and now I can't link - I see emld
was removed so I replaced it with emcc
in the makefile but now this happens:
/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/build/file2str.o obj/sdl/messtiny/libocore.a -lm
sdl-config --libs`pkg-config --libs fontconfig
-lSDL_ttf -lutil -o obj/sdl/messtiny/build/file2str
JAVA not defined in ~/.emscripten, using "java"
Traceback (most recent call last):
File "/home/jkerk/emscripten/emcc", line 551, in
The first issue should be fixed in incoming.
The second should be fixed on incoming as well (it was fallout from LLVM deprecating llvm-ld).
When I try to compile now the resulting .js is missing important functions like _main() and __ZN7astring4initEv(). This is before closure has been run on it.
.bc: http://interbutt.com/temp/messtiny-20120622.bc.zip
Command line:
emcc messtiny.bc -o messtiny.js --post-js post.js --embed-file roms/coleco.zip --embed-file cosmofighter2.zip
post.js: https://raw.github.com/ziz/jsmess/no_cothreads/post.js
Maybe they are missing from the .bc file somehow but I don't know how to check.
LLVM's llvm-nm tool can tell you what symbols are in a .bc file. Is main there?
It's not there. It is in obj/sdl/messtiny/osd/sdl/sdlmain.o and also obj/sdl/messtiny/libosd.a but disappears after the final link:
/home/jkerk/emscripten/emcc -Wl,--warn-common -s obj/sdl/messtiny/version.o obj/sdl/messtiny/drivlist.o obj/sdl/messtiny/emu/drivers/emudummy.o obj/sdl/messtiny/mess/drivers/coleco.o obj/sdl/messtiny/mess/machine/coleco.o obj/sdl/messtiny/libosd.a obj/sdl/messtiny/libcpu.a obj/sdl/messtiny/libemu.a obj/sdl/messtiny/libdasm.a obj/sdl/messtiny/libsound.a obj/sdl/messtiny/libutil.a obj/sdl/messtiny/libexpat.a obj/sdl/messtiny/libsoftfloat.a obj/sdl/messtiny/libformats.a obj/sdl/messtiny/libz.a obj/sdl/messtiny/libocore.a -lm
sdl-config --libs`pkg-config --libs fontconfig
-lSDL_ttf -lutil -o messtiny`
I am getting a lot of /usr/bin/llvm-dis: Invalid bitcode signature
at link time which may be related (llvm-dis likes sdlmain.o but not libosd.a, but llvm-nm works on both).
I'm using llvm 3.1.
Perhaps LLVM 3.1 changed linking semantics somehow, very odd. Can you send me the relevant bitcode files (before the link that removes main) so I can try to reproduce myself? (please try to narrow it down as much as possible)
Here's a smaller example with the testkeys
utility: http://interbutt.com/temp/testkeys-linkerror.zip
Running the command line in maketestkeys
results in an output file which is missing _Z15utf8_from_ucharPcjj(), which appears to be present in libutil.a. (Compiling the output to a .html with emcc
, then opening it in a browser and pressing a key trips the missing function.)
The problem is the attempt to link in native x86 binaries. You need to remove those, they can't be compiled into JS. Normally this is not too much of a problem, we detect them and ignore them - it just makes your builds slower. But in this case here, you have
emcc -Wl,--warn-common -s testkeys.o libutil.a libocore.a -lm `sdl-config --libs` `pkg-config --libs fontconfig` -lSDL_ttf -lutil -o testkeys
Note there is both libutil.a
- a bitcode file - and -lutil
- a request for a system library to be linked. This is what mixes up the compiler.
Replace that line with
emcc -Wl,--warn-common -s testkeys.o libutil.a libocore.a -o testkeys
that is, remove all requests for system libraries - and it will work.
With that said, emcc should still not get confused even though there are irrelevant x86 libraries. I pushed a possible fix for that, it might help here. But I still recommend removing native libraries, even if they do not make the build fail they make it slower.
Thanks, that fixes the testkeys example and restores __ZN7astring4initEv() to MESS, _main() is still missing though so some more poking is in order.
Well my poking time has been pretty limited so here is something reproducible anyway:
http://interbutt.com/temp/mess-linkerror.zip
$ llvm-nm libosd.a | grep main
T main
d _ZL13main_threadid
$ emcc -s version.o drivlist.o emudummy.o coleco_driver.o coleco_machine.o libosd.a libcpu.a libemu.a libdasm.a libsound.a libutil.a libexpat.a libsoftfloat.a libformats.a libz.a libocore.a -o messtiny.bc
$ llvm-nm messtiny.bc | grep main
T _Z16jsmess_main_loopv
T _Z20jsmess_set_main_loopR16device_scheduler
d _ZL13mnemonic_main
t _ZL18menu_main_populateR15running_machineP7ui_menuPv
t _ZL9menu_mainR15running_machineP7ui_menuPvS3_
T _ZNK24device_execute_interface16cycles_remainingEv
T _ZNK9emu_timer9remainingEv
U emscripten_set_main_loop
No main(). Actually lots(all?) stuff from libosd.a appears not to make it in (e.g. _Z13sdlinput_initR15running_machine()) so maybe something is going wrong with that file.
Ok, looks like what happens here is that main()
is in an archive file and not a normal object. We were only looking for explicit undefined symbols in archives so we missed this. This is fixed in incoming. With this fix, I see main
as well as _Z13sdlinput_initR15running_machine
etc.
Link is good now (thanks!)
Getting this when I try to run it in Chrome:
Uncaught TypeError: Object #<CanvasRenderingContext2D> has no method 'getExtension'
dynamic_cast can get into an infinite loop (encountered when compiling the MAME / MESS source). Per discussion, "the current implementation is missing some stuff like multiple inheritance etc., so maybe more basic stuff needs to be done first."
The dynamic_cast call arguments can be found in llvm's tools/clang/lib/CodeGen/CGExprCXX.cpp around line 1535 (using SVN rev 139974 of clang 3.0).
(Apologies for the pretty minimal issue content; I don't have anything resembling a minimal reproduction case to offer yet.)