dreamlayers / em-dosbox

An Emscripten port of DOSBox
www.dosbox.com
GNU General Public License v2.0
1.22k stars 154 forks source link

"Emulation aborted due to nested emulation timeout." #11

Open hikari-no-yume opened 9 years ago

hikari-no-yume commented 9 years ago

Is there any way to increase this timeout?

dreamlayers commented 9 years ago

If compiled with Emscripten emterpreter sync, https://github.com/dreamlayers/em-dosbox/blob/em-dosbox-svn-sdl2/src/dosbox.cpp#L163 , in particular the numbers after the + here:

        } else if (SDL_TICKS_PASSED(ticksEntry, last_sleep + 2000) &&
                   !SDL_TICKS_PASSED(ticksEntry, last_loop + 200)) {

Without sync, here https://github.com/dreamlayers/em-dosbox/blob/em-dosbox-svn-sdl2/src/dosbox.cpp#L442 :

if (GetTicks() - ticksStart > 1000) {

I think it's unlikely that would help the problem, except in cases where extreme browser slowness is causing it. If the program is able to run at reasonable speed and the timeout is exceeded, it's unlikely to recover later on. If you encounter a timeout problem in a build without sync, try the program in a build with sync. I just made sync the default, because latest stable (master branch) Emscripten has the necessary support now.

hikari-no-yume commented 9 years ago

Hmm. My use-case is unusual - see http://win95.ajf.me/ - where I'm running Windows 95 under DOSBox. The problem is that at some times of CPU stress it'll just abort. In particular, things like when an application starts or ends.

I'm not actually sure if I'm using the emterpreter or not. It's the default so you'd think I was using it, but it complains if the .mem is missing, so I'm a bit confused.

hikari-no-yume commented 9 years ago

OK, it's very definitely using the emterpreter; with --disable-sync, it won't work: it just exits the moment the disk image loads. Weird that it needs the .mem, though.

hikari-no-yume commented 9 years ago

Could this be made less frequent by clock rate-limiting emulation, or something? Or is it a problem caused by too much stuff happening on a single cycle(?)

dreamlayers commented 9 years ago

Emscripten emterpreter requires the .mem file. There is no real technical reason why that is unavoidable; it's just the way that Emscripten feature is designed.

I believe that the problem is paging. The DOSBox paging code enters into a nested CPU emulator when a paging exception happens, which then exits when execution gets back to where the exception happened. If emulated code never returns there, the nested emulator never exits. If there are multiple exceptions and they don't return in last-in first-out order, same problem. I've never run Windows 95 on DOSBox, but I've seen this happen running Linux and people have reported it with ordinary DOSBox running Windows 95. The difference is that ordinary DOSBox just runs slower when this happens, and Em-DOSBox gets a nested emulation timeout.

Also see this thread http://www.vogons.org/viewtopic.php?f=32&t=37417

hikari-no-yume commented 9 years ago

Oh, I got confused, I thought it was not using the emterpreter that needs the .mem, that explains it.

The paging explanation makes sense. Unfortunate, that.

dreamlayers commented 9 years ago

Your idea of using a web worker is intriguing. I assume that way DOSBox could keep executing normally with no need to return to the browser regularly. Hopefully SDL calls would work via proxy to worker.

Nested emulation in paging is a problem in any case, but at least it could run that way. It would certainly be slower because it runs instructions one at a time in the slower CPU_Core_Full.

DOSBox paging works this way because DOSBox is a combination of CPU emulation and native C++ code implementing DOS and the BIOS. If a paging exception could occur in the native code, it needs to work like a function call. If you load an OS which handles everything, and there is no chance of a page fault in DOS or BIOS code, then this hack wouldn't be needed. Apparently the Java port jDosBox stops using the return address matching when Windows 95 is loaded, according to the comment by danoon here.

hikari-no-yume commented 9 years ago

Oh I see, that explains the unusual design. The trick of not doing a second function call in jDosBox seems to be reliant on using a real BIOS.

re: web workers, another advantage of the web worker is you don't need to use the emterpreter at all, so you can get improved performance (albeit with worse startup time). I don't know how well it would work in practice, as there might be a lot of overhead in sending messages rather than directly calling SDL. Since emscripten already has proxy to worker support, I wonder if I couldn't get this working. Hmm. I might fiddle about with it.

hikari-no-yume commented 9 years ago

If I do this, everything's broken:

CCFLAGS="--proxy-to-worker" CXXFLAGS="--proxy-to-worker" emconfigure ./configure --disable-sync

I think it's because em-dosbox uses a custom HTML page?

"running code in a web worker" win95.html:1297:4
TypeError: Module is undefined dosbox.js:2:0
hikari-no-yume commented 9 years ago

So anyway, I tried increasing the timeouts, first doubling them, then multiplying by 5. It didn't help. All that changes is that where it would freeze for 2 seconds before and then abort, now it'll freeze for 10 seconds and then abort. I'm guessing that it's the page fault issue you mentioned: Windows is handling a page fault and not returning, and so DOSBox never returns to the outer function, and emulation is aborted. :(

dreamlayers commented 9 years ago

The only situation that might be easily fixable in Windows is if some application crashes with a page fault that can't be handled. Then Windows wouldn't return because it can't handle that. Doing something to make the application not crash would fix that problem.

I got things partially running in a worker. I used CXXFLAGS="-O3 -g --proxy-to-worker" emconfigure ./configure --disable-sync --without-sdl2. Then also src/pre.js needs to be empty because Module won't be found, and _emscripten_set_pointerlockchange_callback() will fail so I just removed the body of that function. (That code is from Emscripten src/library_html5.js and maybe a bug report should be filed.) Then I could open dosbox.html.

I have "MIXER:Can't open audio: unknown SDL-emscripten error , running in nosound mode." and "Emulation ended because interactive shell is not supported." errors still. The html is substantially different and the packager doesn't work with it. I tried using this dosbox.conf

[autoexec]
mount c .
c:
gwbasic

and adding --preload-file dosbox.conf --preload-file Gwbasic.exe to the final link command line, but then it just appears to hang after "CONFIG:Loading primary settings from config file dosbox.conf"

dreamlayers commented 9 years ago

I modified a simple SDL 2 test program I used before, replacing emscripten_set_main_loop(mainloop, 0, 0); with

  while (1) {
      mainloop();
      SDL_Delay(100); 
  }

Then I rebuilt it with --proxy_to_worker and it simply worked, both in Firefox and in Chrome. The program's animation was properly displayed, and the browsers remained fully responsive.

The program was using 100% of one CPU core because SDL_Delay() is a busy wait. There may not be any way to call a function in a worker which waits and then continues execution after that function. Something that returns immediately and causes another function to run later can't be used as a replacement for SDL_Delay() in general.

Test program I modified is at github.com/kripken/emscripten/issues/3283#issuecomment-84776241 . (Not making that a link because I don't know how link without misleadingly cross-connecting that unrelated issue to this one.)

hikari-no-yume commented 9 years ago

Interesting! It's unfortunate that there's no way to make SDL_Delay not be a busy-wait loop. Still, great progress.

dreamlayers commented 9 years ago

It seems a web worker can't receive messages if it is running in an infinite loop. This means DOSBox could not get input. If you want input, you need to regularly return, just like when not running in a worker. So, the web worker idea cannot solve this issue.

hikari-no-yume commented 9 years ago

Oh great :/

J7mbo commented 9 years ago

@TazeTSchnitzel Yep, getting this in safari, but not chrome.

hikari-no-yume commented 9 years ago

Because Safari's JS engine is slower.

dreamlayers commented 9 years ago

I asked about running synchronous code in a web worker on emscripten-discuss. Apparently it will be possible in the future!

@J7mbo, if you get timeouts in one browser but not another, that's definitely due to slowness. I would like to prevent such timeouts from happening, but I'm not sure how to both detect real hangs promptly and prevent nuisance timeouts. Maybe I should at least increase the timeout to 5 seconds. Hangs only happen with some software in some specific situations, and hanging the browser for 5 seconds isn't so bad.

Did you get the hang in a build with sync, or a build without sync? Different code is used depending on this.

hikari-no-yume commented 9 years ago

They probably came here from my site. win95.ajf.me uses a build with sync.

kripken commented 9 years ago

Emscripten emterpreter requires the .mem file.

Btw, that restriction was removed recently on emscripten incoming.

hikari-no-yume commented 9 years ago

Neat! One less file to gzip and download. Though it's really tiny gzipped so it's not really much hassle.