dosbox-staging / dosbox-staging

DOSBox Staging is a modern continuation of DOSBox with advanced features and current development practices.
https://www.dosbox-staging.org/
Other
1.29k stars 154 forks source link

Implement FluidSynth integration #262

Closed dreamer closed 3 years ago

dreamer commented 4 years ago

The patch integrating FluidSynth was developed a long time ago (first revision 9 years ago, last revision 3 years ago - it really waits that long for inclusion upstream?). It is currently being distributed via ECE.

In the past, I was very hesitant to integrate FluidSynth patch for a number of reasons:

  1. I wanted to clean up first, before starting new work in the area. Without going into details… there are serious issues with DOSBox MIDI support - the demand for integrating Fluid might be caused by those issues.
  2. I don't want to target Fluid 1.x API. It's old - Fluid is being actively developed and they have 2.x API released already.
  3. When developing Boxtron I tested FluidSynth running as application thoroughly… and the results were very mixed - to the point, that I recommend Boxtron users to use Timidity++ by default. All that testing was around version 1.1 though (which is old and hopefully not representative of quality in 2.x).
  4. After a brief look at patch itself, I have some reservations: MIDI synth should not run in the same process as emulation loop, properties do not use the same naming style as other options, INSTALL documentation is missing, user documentation is missing, build was never tested under Visual Studio, (and more) - all those issues can be easily fixed; patch itself is not that big nor invasive, it just needs some work.
  5. libfluid 2.x was not available in repositories, which would mean integrating patch required bundling fuid in the repo, which means transition to cmake, yada, yada - all the same reasons as for #257

What changed?

Over last 1-2 months (?), point (5) stopped being a problem, fluidsynth 2.x is now available in most repositories, including brew and vcpkg. That takes out reason (5), and it's huge.

Points (4) and (2) are just a matter of putting in some work - not a big deal. I still want to address (1), but after looking in details at the patch - this work can largely happen in parallel. (3) can be assessed only during testing.

Of course, integration of Fluid needs to be optional - the lib is not old nor tested enough to be propagated to all repositories, so some users will want to build without it. But the patch already kind-of handles it.

When?

Testing effort required would push 0.75.0 release too far away, so I won't merge any new MIDI-related changes before 0.75.0 release (here are tasks planned ATM for 0.75.0).

But if someone has free cycles and wants to push this work forward: the patch is imported on a side branch already - cherry-pick f786b9fdc52df0e9b39ee27d640fb37ba7c369e6 and start working on fixes to problems I listed in (2) and (4) ;) If nobody will step up, I'll start work on this sometime after 0.75.0.

kcgen commented 4 years ago

Well laid-out plan @dreamer ! :bookmark_tabs:

bluddy commented 4 years ago

Regarding part of point 4, I assume you mean using separate threads for midi synthesis. Fluidsynth already uses other threads internally, meaning that it doesn't burden the dosbox thread.

In general, the patch seems simple enough to me that it can just be merged. Further cleanup can proceed later on as a specific task. Most importantly, it'll allow more users to switch to dosbox-staging, who will then be invested in the project's success.

dreamer commented 4 years ago

Fluidsynth already uses other threads internally, meaning that it doesn't burden the dosbox thread.

We need proof to confirm this assertion. It's not hard, just generating flamegraphs and some testing - but that's part of development work needed to push the patch through the finish line.

Also, I am not going to merge patch in a state where CI does not verify the build using multiple compilers on multiple OSes. And that's the state the patch is in ATM :(.

bluddy commented 4 years ago

We need proof to confirm this assertion. It's not hard, just generating flamegraphs and some testing - but that's part of development work needed to push the patch through the finish line.

From the fluidsynth pages:

As soon as the audio driver is created, it will start playing. The audio driver creates a separate thread that uses the synthesizer object to generate the audio.

This is unlike munt, which has no notion of threading.

kcgen commented 4 years ago

Most importantly, it'll allow more users to switch to dosbox-staging

Great point @bluddy , I agree. Regarding the flamegraphs @dreamer mentioned, are you able to generate them?

I haven't had time to dig in yet; but if you can build and run fluidsynth (standalone) on a linux box, here are notes on how to generate them: https://github.com/brendangregg/FlameGraph/blob/master/README.md

dreamer commented 4 years ago

Without going into details, this is the way I generate flamegraphs (Linux only), before starting: the FlameGraph repo mentioned earlier needs to be cloned (for the scripts inside).

  1. Build dosbox-staging with -g -O0 -fno-omit-frame-pointer
  2. Start dosbox-staging, and note PID of the process
  3. Start the benchmark / test inside dosbox, and immediately:
  4. sudo perf record -F 99 -p <PID> -g -- sleep 120 -F 99 is probing stack at 99Hz - empirically, this seems like a good middle-ground, at least for me (we don't want too high value for probing, as it makes the results less realistic!); play the game or watch benchmark for 2 minutes
  5. sudo perf script | ~/src/FlameGraph/stackcollapse-perf.pl > out.perf-folded
  6. ~/src/FlameGraph/flamegraph.pl out.perf-folded > flamegraph-game-test-description.svg

Resulting SVG to be opened in the browser (it has js to allow for easier browsing and filtering the graph). I need to describe somewhere how to interpret the graph, but overall: there will definitely be a huge plateau of unrecognized stacks (that's dynrec-generated CPU emulation, we're not interested in that) and a tower, that can be clearly recognized as main dosbox "Normal" loop; narrow towers on top of normal loop are ok - plateaus on top of normal loop are bottlenecks. (The thing I'm not sure about is how to be 100% sure flamegraph does not include child threads - but for CD-DA this doesn't seem to be a problem).

If flamegraphs with FluidSynth integrated will look similar to graphs when game is playing music via CD-DA emulation - that's good. if fluidsynth will show up as plateaus covering ~5% of runtime stacks or more - that's bad and we'll need to investigate further.

edit I guess it would be helpful to list the games, that offer music playback both via CD-DA and MIDI to start preparing test cases. I think some candidates might be: HoMM2, Settlers 2, and System Shock.

kcgen commented 4 years ago

Instructions are great! Here's a zoom out of what I got (Jones in the Fast Lane, ~2500 cycles, FLAC CD-DA sequences back-to-back)

Screenshot at 2020-04-14 15-03-52

dreamer commented 4 years ago

Wow, this looks completely different than what I get on x86_64 - can you paste plain svg somewhere?

dreamer commented 4 years ago

Work on this feature started on branch po/fluid-1. Old version of this patch as distributed via ECE has some issues - we are going to use @realnc implementation instead (link) - it seems cleaner, was already converted to FluidSynth 2.1, and seems to be better tested. But we will need to do some tweaks anyway, as certain small design choices in there clash with our future plans.

kcgen commented 4 years ago

A recent platform-specific timer adjustment for fluidsynth: https://github.com/joncampbell123/dosbox-x/commit/e00cf22392f9c0eb9bef3e3e8edea4fa433dc609

https://www.vogons.org/viewtopic.php?p=852446#p852446

realnc commented 4 years ago

A recent platform-specific timer adjustment for fluidsynth: joncampbell123/dosbox-x@e00cf22

https://www.vogons.org/viewtopic.php?p=852446#p852446

These are only used when fluidsynth is doing audio output itself. They are audio driver parameters. When rendering audio into a buffer (with fluid_synth_write_s16() in this case) and letting the dosbox mixer play the audio without creating a fluidsynth audio driver, these parameters have no effect whatsoever.

http://www.fluidsynth.org/api/index.html#UsingSynth

This is how it's done in dosbox-core. The fluidsynth patch that's been floating around for a while now for vanilla dosbox does not do this, and thus there these parameters are important.

kcgen commented 4 years ago

Thanks for the comparison @realnc, and good to know these adjustments won't be needed.

The approach of feeding dosbox's mixer with samples is win-win-win: fewer LOC to maintain, uses a single host-agnostic audio interface abstracted by SDL, and less runtime complexity and overhead.

dreamer commented 4 years ago

Initial, working version of FluidSynth integration can be tested on branch po/fluid-3 - at this point it is (almost) direct port from dosbox-core, but using our normal coding conventions, licensing info and SPDX identifier. Code was also moved to the recently created midi module, to avoid littering gui any more.

Testers: you need to compile it yourself. Our CI does not provide precompiled snapshots with FluidSynth integration (yet). FluidSynth 2.x is available in many distro repositories, but it's still missing from a few notable ones.

Do not ask me for support as of yet - you're on your own. Do not get married to new fluid settings as inherited from dosbox-core - we will change them (not sure exactly how yet).

I tested it on Ubuntu 20.04 and Windows 10 and it seemed to work fine, but code is not good enough quality-wise and we have no user documentation, so it won't be merged to master just yet.

dreamer commented 4 years ago

The first part of FluidSynth 2.x support was just merged via #539 :)

But I'm not closing this feature request just yet - we need to polish it a little bit, add more documentation, implement some missing bits, we have 1 small bug… but as of now, dosbox-staging finally has a built-in MIDI synth.

To testers: recreate your config file - there's a new fluidsynth section in. The current set of user-changeable settings is not final.

I am especially interested in learning from testers using wide range of SoundFonts:

If you're compiling the code yourself, then FluidSynth support should work on any OS. If you're using our pre-compiled snapshot builds, then ATM only Windows builds have the feature enabled (fluidsynth 2.x library is missing from brew repo on macOS and from Ubuntu 18.04 repos, so we cannot provide pre-compiled packages on those OSes yet).

arrowgent commented 4 years ago

i think this is from SVN source dosbox: https://launchpad.net/~i30817/+archive/ubuntu/dosbox-patched

##
#            fluid.driver: Driver to use with Fluidsynth, not needed under Windows. Available drivers depend on what Fluidsynth was compiled with
#                            Possible values: pulseaudio, alsa, oss, coreaudio, dsound, portaudio, sndman, jack, file, default.
#         fluid.soundfont: Soundfont to use with Fluidsynth. One must be specified.
#        fluid.samplerate: Sample rate to use with Fluidsynth.
#              fluid.gain: Fluidsynth gain.
#         fluid.polyphony: Fluidsynth polyphony.
#             fluid.cores: Fluidsynth CPU cores to use, default.
#           fluid.periods: Fluidsynth periods.
#        fluid.periodsize: Fluidsynth period size.
#            fluid.reverb: Fluidsynth use reverb.
#                            Possible values: no, yes.
#            fluid.chorus: Fluidsynth use chorus.
#                            Possible values: no, yes.
#   fluid.reverb,roomsize: Fluidsynth reverb room size.
#    fluid.reverb.damping: Fluidsynth reverb damping.
#      fluid.reverb.width: Fluidsynth reverb width. (.76)
#      fluid.reverb.level: Fluidsynth reverb level. (.57)
#     fluid.chorus.number: Fluidsynth chorus voices
#      fluid.chorus.level: Fluidsynth chorus level.
#      fluid.chorus.speed: Fluidsynth chorus speed.
#      fluid.chorus.depth: Fluidsynth chorus depth.
#       fluid.chorus.type: Fluidsynth chorus type. 0 is sine wave, 1 is triangle wave.
#                            Possible values: 0, 1.

i use a Roland SC-55 soundfont, i cant remember where i got it from FluidR3_GM_sc-55.sf2 108424522 bytes

i dont have fluidsynth running all the time i call it to start using a script then whatever application requires it when the application exits so does fluidsynth. ie:

/usr/bin/fluidsynth -a pulseaudio -m alsa_seq -i -l -s -p FluidSynth /usr/share/sounds/sf2/FluidR3_GM.sf2 &
/usr/games/dosbox &&

that said, i dont start fluidsynth with dosbox-staging ive only done minor testing with dosbox-staging, sounds seem to be fine there were a few games which music was not playing. i will try to figure out which games do/dont play both by starting fluidsynth and by using static built-in fluidsynth

edit: my mistake, im not using the fluidsynth patch. disreguard my testing

arrowgent commented 4 years ago

some dos midi software useful for testing? some midi players could be useful for exclusively testing midi (no extra game/sfx/etc, and lightweight)

http://dosmid.sourceforge.net/

also GSPLAY 1.0 version gsplay1.zip or labelled 1.1 i dont see any public sources, although many respositories list it as "freeware" so i wont post a link here not the same as gsplay v2.x free which is for windows

untested but others report using: megamid dos

kcgen commented 3 years ago

With @grapeli's report of playback gaps on slightly older hardware, I wondered if the integrated FS library really is as threaded as an external midi player is when processing sysex calls entirely out of band relative to dosbox.

The documentation http://www.fluidsynth.org/api/ mentions:

FluidSynth's rendering engine is implemented by using the "Dispatcher Thread Pattern". This means that a certain thread A, which calls one of FluidSynth's rendering functions, namely

fluid_synth_process() fluid_synth_nwrite_float() fluid_synth_write_float() fluid_synth_write_s16() automatically becomes the "synthesis thread". The terms "synthesis context" and "synthesis thread" are equivalent

So.. we have Dosbox's 1ms loop calling FS's above-mentioned rendering function, and this means the synthesis is actually be done in Dosbox's main thread.

That was true for 2.1.5, which is the latest. The docs mentions for 2.2.0:

The sequencer has received a major revisal. For you that means: The sequencer's queue no longer blocks the synthesizer thread, due to being busy arranging its events internally.

So I guess some underlying work, namely this "sequencer queue that arranges events" is now being broken out of the rendering call.. so hopefully that further reduces the block-time when synthesizing the audio.

I suspect to fully disconnect FS from blocking us, we'd need to put the FS synth object inside a thread, and front-run its rendering by some number of milliseconds. So when Dosbox's 1ms loop comes around, the samples are 100% ready to be written into the audio channel, and if they're not ready then we block until they are let the user know that FS couldn't keep up: so either add more FS threads and/or increase the pre-render latency (which might be a new conf options specifying some number of milliseconds.. or we could derive it from the user's [mixer] prebuffer ms setting).

dreamer commented 3 years ago

So just as we suspected in the beginning (point 4 in my original post)… :( Also, notably, 2.2.0 is ABI incompatible with previous versions… we will need to keep an eye on this.

grapeli commented 3 years ago
[fluidsynth]
synth_threads = 1

# ps -T -l -C dosbox
F S   UID     PID    SPID    PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 R  1000   31240   31240    1368 24  89   0 - 195520 -     pts/6    00:00:04 dosbox
1 S  1000   31240   31241    1368  0 107  19 - 195520 -     pts/6    00:00:00 dosbox:disk$0
1 S  1000   31240   31242    1368  0 107  19 - 195520 -     pts/6    00:00:00 dosbox:disk$1
1 S  1000   31240   31243    1368  0 106  19 - 195520 -     pts/6    00:00:00 dosbox:disk$2
1 S  1000   31240   31244    1368  0 107  19 - 195520 -     pts/6    00:00:00 dosbox:disk$3
1 S  1000   31240   31245    1368  0  99  19 - 195520 -     pts/6    00:00:00 SDLHotplugALSA
1 S  1000   31240   31246    1368  1  80   0 - 195520 -     pts/6    00:00:00 SDLAudioP2

[fluidsynth]
synth_threads = 2

# ps -T -l -C dosbox
F S   UID     PID    SPID    PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S  1000   31194   31194    1368 45  80   0 - 214010 -     pts/6    00:01:01 dosbox
1 S  1000   31194   31195    1368  0 106  19 - 214010 -     pts/6    00:00:00 dosbox:disk$0
1 S  1000   31194   31196    1368  0 106  19 - 214010 -     pts/6    00:00:00 dosbox:disk$1
1 S  1000   31194   31197    1368  0 106  19 - 214010 -     pts/6    00:00:00 dosbox:disk$2
1 S  1000   31194   31198    1368  0 106  19 - 214010 -     pts/6    00:00:00 dosbox:disk$3
1 S  1000   31194   31199    1368  0  99  19 - 214010 -     pts/6    00:00:00 SDLHotplugALSA
1 S  1000   31194   31200    1368  1  80   0 - 214010 -     pts/6    00:00:01 SDLAudioP2
1 S  1000   31194   31201    1368  4  -1   - - 214010 -     pts/6    00:00:06 mixer0

In this case, the priority of the mixer thread in htop is -61 (very high, the same is also external fluidsynth).

kcgen commented 3 years ago

@grapeli , thanks :👍

Assuming we will have other users in this boat, please share your recommended config changes that work better than our current defaults (no rush; once you're done investigating).

If these also work for others without detriment, then I suggest we go with your settings as safe/better values while we address the remaining issues with FluidSynth (such as putting it in parallel thread instead of blocking our main loop).

kcgen commented 3 years ago

Below comment from @grapeli moved from PR https://github.com/dosbox-staging/dosbox-staging/pull/640#issuecomment-706623816 to this FluidSynth discussion thread.

My problems with the built-in fluidsynth are too slow access to data on the disk (hdd). Which can cause the sound hiccups generated by the built-in fluidsynth. Changes to the mixer settings do not affect this (this is my completely misinterpretation). The real reason for smooth audio is that the second, third, fifth time the data is cached or completely in memory (tmpfs).

CPU is not a problem, although in HoMM2 there may be a load spike close to this limit. I built an optimized (profiled) version of this branch, with interpolation, chorus and reverb turned off (the sound is clearly lower quality), but the load is also lower.

kcgen commented 3 years ago

Interesting @grapeli . So when you move your SF2 to /dev/shm (or tmpfs), the problem disappears?

In this case @dreamer, we should consider a follow on PR that reads the entire SF2 into memory. Something like: http://www.fluidsynth.org/api/fluidsynth_sfload_mem_8c-example.html

However, this solution is pretty ugly. Ideally FluidSynth would offer an option to read the entire file into memory on load to eliminate poor IO latency from DoS'ing the stream. I will open and issue and see what their developers think.

CPU is not a problem, although in HoMM2 there may be a load spike close to this limit.

Yes - the two are often connected (plus Linux reports blocked IO as CPU user time too :sweat_smile:). This was also happening to be on the Pi (I think we had a discussion about this); and on there I had to tell Timidity to load the entire SF2 into RAM, and I inculded a delay before launching DOSBox to account for this pre-load duration. Otherwise I would lose several seconds of MIDI during the Sierra logo startup sequence.

We definitely need to find the equivalent solution with FluidSynth.

I built an optimized (profiled) version of this branch, with interpolation, chorus and reverb turned off (the sound is clearly lower quality), but the load is also lower.

Thanks for these tests. Just to be absolutely clear, can you confirm that you have zero-hiccups when the SF2 is loaded from /dev/shm/path/ (RAM-disk), even with chorus, reverb, and 7th order poly all enabled? If you confirm this, then that would confirm that CPU-load-wise we're are good and that IO is exclusively the problem (even on older hardware).

kcgen commented 3 years ago

@grapeli , see: https://github.com/FluidSynth/fluidsynth/issues/685

grapeli commented 3 years ago

Interesting @grapeli . So when you move your SF2 to /dev/shm (or tmpfs), the problem disappears?

I ran another test.

  1. Soundfont on hdd disk and mounted one directory with DOS programs (~2500 files) including HoMM2 from hdd. 01-homm2.flac.txt Approximately 19 audio interruptions.

  2. Soundfont on hdd, HoMM2 mounted as a separate directory from hdd. 02-homm2.flac.txt Three breaks, only in the first 10 seconds.

  3. Soundfont on hdd, HoMM2 mounted as a separate directory from tmpfs. 03-homm2.flac.txt Zero interruptions in audio delivery.

  4. Soundfont (500MB, SGM-v2.01-Nice-Piano-Guit-Bass-v2.4.sf2) on hdd, HoMM2 mounted as a separate directory from tmpfs. Zero interruptions in audio delivery.

Each test is preceded by cleaning the cache. echo 3 | sudo tee /proc/sys/vm/drop_caches

dreamer commented 3 years ago

OK, so as for now we know:

  1. FluidSynth loads whole soundfont into the memory, processes it and closes the handle. It does that because we keep synth.dynamic-sample-loading to default (disabled). We discussed enabling this feature previously, but this behaviour indicates we should probably keep it as disabled. Or maybe even explicitly set it to false to indicate it's disabled by design.
  2. @grapeli findings are very interesting in this context and warrant further investigation, but first we'll need to address (3)
  3. FS maintainers confirmed we need to create separate synthesis thread - this is the most likely source of issues right now.
  4. WIP: Soft Limiter task to mitigate impact of too loud/too quiet soundfonts.
  5. WIP: Loading SF files from known locations - some prerequisites are being implemented.

@kcgen I'll wait with starting work on (3) until SoftLimiter work will be finished to avoid conflicts.

mawe42 commented 3 years ago
  1. FluidSynth loads whole soundfont into the memory, processes it and closes the handle. It does that because we keep synth.dynamic-sample-loading to default (disabled). We discussed enabling this feature previously, but this behaviour indicates we should probably keep it as disabled. Or maybe even explicitly set it to false to indicate it's disabled by design.

Just one more quick comment regarding the dynamic-sample-loading: it might be worthwhile to give your users the option to choose dynamic loading vs. static loading of the samples. Especially when using many or large Soundfonts, enabling it will help to save possibly hundreds of megabytes of RAM in addition to reducing the load time of the Soundfonts significantly. The big downside is IO in the render thread, of course. But my guess is that you are using FS only for music playback, so you could probably get away with a large buffer and latency in the 100+ ms range. And with that, dynamic sample loading will probably only be problematic on really old hardware or very slow HDDs...

Anyway, I hope you get FS integrated into dosbox without problems. In case you need more input, the fluid-dev mailing-list is the best place for general feedback and usage questions.

kcgen commented 3 years ago

@grapeli,

Your tests match with what the FluidSynth team explaned (that FluidSynth by default reads the entire SF2 into memory), which explains why there is no stuttering in your tests 3 and 4.

I was able to measure this load-in-full behavior using the ~517 MiB SGM soundfont:

2020-10-11_09-32

@dreamer, I agree with your 2nd point above because an IO stall inside dosbox's emulation loop (in this case, reading a HOMM2 data file as shown in @grapeli's tests 1 and 2), will block dosbox from gathering audio from its channel sources; because dosbox employs one big serialized loop.

To expand on it: even if we thread FS's synth call, dosbox's channel callback will still be operating in dosbox's main loop (that is, the callback that copies the buffer produced by synth(..) into the mixer via add_samples_(...)). Likewise, once all the callbacks have fed dosbox's mixer, it combines them into a final mixed audio buffer and writes that out to SDL, which will also be blocked because the mixer is part of the serial loop.

So the mixer, its callbacks (and something to drive them on a timer), and FluidSynth all need to be threaded if we want SDL to keep receiving audio samples (and thus keep hearing MIDI music) while dosbox's primary loop gets blocked on game IO stalls.

This is going to be large architectural overhaul where all of the emulated sound devices would become threaded. I'm not sure how it will work, given many of those sound devices require the DOS program to be poking and prodding their registered ports and IRQs, which influence generated audio.

MasterO2 commented 3 years ago

@grapeli,

Your tests match with what the FluidSynth team explaned (that FluidSynth by default reads the entire SF2 into memory), which explains why there is no stuttering in your tests 3 and 4.

I was able to measure this load-in-full behavior using the ~517 MiB SGM soundfont:

* **Left pane**: `sudo pmap <dosbox-staging PID>` shows a single heap allocation holding the soundfont

* **Right pane**: `dstat` shows the disk reads per second accumating to ~519 MiB, after which the dosbox shell appears

2020-10-11_09-32

@dreamer, I agree with your 2nd point above because an IO stall inside dosbox's emulation loop (in this case, reading a HOMM2 data file as shown in @grapeli's tests 1 and 2), will block dosbox from gathering audio from its channel sources; because dosbox employs one big serialized loop.

To expand on it: even if we thread FS's synth call, dosbox's channel callback will still be operating in dosbox's main loop (that is, the callback that copies the buffer produced by synth(..) into the mixer via add_samples_(...)). Likewise, once all the callbacks have fed dosbox's mixer, it combines them into a final mixed audio buffer and writes that out to SDL, which will also be blocked because the mixer is part of the serial loop.

So the mixer, its callbacks (and something to drive them on a timer), and FluidSynth all need to be threaded if we want SDL to keep receiving audio samples (and thus keep hearing MIDI music) while dosbox's primary loop gets blocked on game IO stalls.

This is going to be large architectural overhaul where all of the emulated sound devices would become threaded. I'm not sure how it will work, given many of those sound devices require the DOS program to be poking and prodding their registered ports and IRQs, which influence generated audio.

@kcgen So that architectural overhaul is for after 0.76?

kcgen commented 3 years ago

@MasterO2 - I'm not sure.

Moving FluidSynth to its own thread could happen in time for 0.76 (@dreamer's second point). But I don't think that will solve the issue.

I'm also very worried that threading dosbox's entire audio subsystem will result in broken audio for devices that involves lock-step port & IRQ control (ie: Sound Blaster, Adlib, and GUS); so we'd need a hybrid approach where only the relatively stand-alone audio sources (such as MIDI and CDDA) are broken out with a separate threaded path out SDL; and I personally don't have the stomach for that amount of work (at least at this point), so I can't vouch for a timeframe.

That said, there are other zero-risk approaches that might help @grapeli :-) Working on that right now.

grapeli commented 3 years ago

@kcgen I more or less understand the cause of the problem.

I will add that I did one more test (four times) with the external fluidsynth for the most demanding first case. The result for four repetitions is 100% correct.

I checked what files are read during the test. inotifywait -m -e access -e open -r HoMM2/ 2>&1 | tee /tmp/hero2.log Mostly this one DATA/HEROES2.AGG (~41.5 MB).

grep -c HEROES2\.AGG /tmp/hero2.log
2896

I don't know how big chunks it reads this data.

A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.

dreamer commented 3 years ago

@MasterO2 Yes, after 0.76.0; and I would really, really prefer to introduce Rust before we'll start introducing multithreading in all places we need it.

@kcgen Yes, I agree that making mixer multi-threaded (perhaps even with each channel operating in it's own thread) is a long-term goal, but we don't need it to have somewhat-usable FluidSynth support. Right now mixer callback is blocked for the whole time of FS synthesis; after we'll move it to separate thread it will be blocked only waiting for FS to finish the job and consume the buffer (synthesis will be able to start as soon as sysex commands will start arriving.

So we'll have usable FS MIDI, probably usable to the same level as emulating midi via GUS ULTRAMID is right now (which also results in radically reducing our emulation speed, but rarely anyone complains about it).

dreamer commented 3 years ago

A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.

@grapeli What? How? You need to configure sequencer port the same way as it always worked.

kcgen commented 3 years ago

@grapeli,

We certainly don't want you having to copy your games to tmpfs just to play them. Fortunately Linux gives us some knobs to achieve a "full preload on demand" behavior to help hide the impact of high-latency IO.

Here's how I converted my HOMM2 installation and adjusted my system:

  1. Convert CD images (bin/cue) to directory of files. The allows Linux's filesystem's read-ahead setting to operate properly on a file-basis. Where as before, it will be only be able to operate on the .bin file without visibility into the ISO9660 files within.
  2. Combine the resulting CDROM directory contents and heroes2 game directory into one. This further collapses roughly ~240 MiB worth of duplicate files and lets the read-ahead work more efficiently (ie: instead of reading the same file from CDROM and then local game directory, it just reads it once).
  3. Defrag the resulting game directory using e4defrag .
  4. Crank up the per-file read-ahead just for the block device storing your games. Let's assume this is sda: echo 49152 | sudo tee /sys/block/sda/queue/read_ahead_kb. This ensures that the entire HEROES2.AGG will be moved into data-cache the very first byte that's read (which is right when you depart the castle).
  5. Crank up the block-device read-ahead, which means that adjacent files during read will also be pulled into data cache (which are now likely to be other HOMM2 game files, given we've defragged the directory) sudo blockdev --setra 49152 /dev/sda

The good news about both the file-system and block read-ahead is that they operate asynchronously beyond the original read request. So when HOMM2 asks for a couple bytes from the AGG file, those bytes are returned immediately as soon as they're read - meanwhile the block and filesystem read-ahead carry on in parallel. So it's the best of both worlds.

Let me know when you get this:

grapeli commented 3 years ago

A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.

@grapeli What? How? You need to configure sequencer port the same way as it always worked.

[midi]
mididevice = alsa
midiconfig = 128:0

It works. Oops. I set it to default.

@kcgen Today I don't have time for further analysis. I ran all tests on XFS. That's a decent fs. He is not to blame. The hard drive model is to blame - WDC WD10SPZX-24Z.

dreamer commented 3 years ago

For anyone reading previous @kcgen's post: of course, this is only a temporary solution - we know users outside of Linux don't have sofisticated options of configuring readahead or tweaking filesystems.

Our goal is to have built-in FS usable on Windows and macOS as well. BTW, users who want to play HoMM2 - this game provides CD audio, so MIDI is not absolutely necessary; but it is very good stress-test.

kcgen commented 3 years ago

@dreamer, we might not be able to solve high latency storage problems unless we're committed to adding our own embedded read-ahead mechanism. That said, we can do the best we can inside the emulator (threading, etc) plus helping users adjust their operating system settings when it makes sense.

That said, NAND storage will pass HDDs in price/GB soon enough - so this problem might vanish with time.

kcgen commented 3 years ago

@kcgen Today I don't have time for further analysis.

No worries.

I ran all tests on XFS.

Great.

realnc commented 3 years ago

When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer. Basically what the original fsynth dosbox patch did, except you'd keep its volume controllable by the dosbox mixer. This way, fsynth would behave similar to running it as a stand-alone client: It constantly renders audio and outputs to its own audio device handle, without blocking anything, and plays the MIDI events sent to it by dosbox.

grapeli commented 3 years ago

@kcgen Today I defragmented the entire xfs filesystem. I will try to make even better tuning. I quickly checked under ext4. Not much better or the same (more repetitions required to be sure). I downloaded the test package.

kcgen commented 3 years ago

Roger than @grapeli. Hopefully the two read-ahead settings make a notable difference.

kcgen commented 3 years ago

When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer.

@realnc , my understanding is that sdl only holds a single instance for a given audio device. So, even though the mixer and (hypothetical) theaded-FS could each hold their own pointer to the same audio device, under the hood sdl is only playing audio for one global buffer.

In other words, you can't write two separate buffers of audio (threaded and overlapping in time), and expect sdl to mix them into a single stream.

This limitation drove the original need for Dosbox's mixer; otherwise each audio channel could have independently written its buffer straight into sdl and let it do the mixing.

(I know this first hand.. I tried to do so in one of the first versions of the CDDA patch 😅, which in turn drove me to using sdl_mixer-X, before deciding it was too heavy of a solution and fell back to using the dosbox mixer).

OpenAL on the other hand does allow many simultaneous inbound streams into the same audio device, and it performs the mixing for you. So this FS push might revive our exploration into using OpenAL (or something like it).

grapeli commented 3 years ago
  1. Crank up the per-file read-ahead just for the block device storing your games. Let's assume this is sda: echo 49152 | sudo tee /sys/block/sda/queue/read_ahead_kb.

  2. Crank up the block-device read-ahead, which means that adjacent files during read will also be pulled into data cache (which are now likely to be other HOMM2 game files, given we've defragged the directory) sudo blockdev --setra 49152 /dev/sda

These two suggestions are quite good. It reduces the number of holes in the sound stream by half (first case), but more importantly they are located only in the first phase in the initial 20-25 seconds, the rest of the test is OK (before that, even in the final phase there were losses).

kcgen commented 3 years ago

@grapeli , that's good news; those settings should mitigate the drawn-out latency impact of slow HDD's in exchange for "front-loading the latency-pain".

That said - we can do better, knowing that HOMM2's MIDI data is contained inside the data/heroes2.agg file. Save the following as start.sh with execute-permissions along side the test package's dosbox.conf. Then launch the game with it.

#!/bin/bash
set -xeu

# Ensure we're running inside the script directory
cd "$(dirname "$0")"

# Find and preload the data files containing latency-sensitive
# MIDI data. The pattern gets both the HOMM2 original campaign
# and expansion pack.
find heroes2/data -iname 'hero*.agg' \
 | xargs -i dd if="{}" of=/dev/null bs=1M

# Launch dosbox
/path/to/dosbox-binary -conf -userconf dosbox.conf 

If this works, I can envision dosbox performing latency-sensitive preload via a list of files provided in the conf file. It wouldn't actually hold them in memory (which would be wasteful), and instead it would read and discard the data knowing that all modern operating system will have a copy of the data in their LRU cache.

realnc commented 3 years ago

my understanding is that sdl only holds a single instance for a given audio device. So, even though the mixer and (hypothetical) theaded-FS could each hold their own pointer to the same audio device, under the hood sdl is only playing audio for one global buffer.

Fluidsynth outputs audio directly through ALSA or PulseAudio (or whatever the equivalent is on other OSes.) SDL is an optional audio driver in fsynth, but it's not the default.

This is how the original fsynth patch does it. This results in two audio streams showing up in the OS mixer when running dosbox. The OS in this case is doing the mixing, just as if two different processes are using the same audio device. Also, it looks kinda weird when you get two OS mixer sliders for the same app. It's not the best solution, but it's easy to do.

Another somewhat easy solution is to have fsynth only render audio in the audio thread. That would be MIXER_CallBack() in hardware/mixer.cpp, which is called by SDL in a different thread. This could potentially also improve rendering perf as well, because that way fsynth will render larger chunks each time, maybe getting better CPU cache utilization. The audio thread isn't doing anything else other than mixing right now, so it should have enough CPU time to both render and mix.

kcgen commented 3 years ago

Thanks @realnc. Yes, we wanted FluidSynth to use the same audio architecture as the other emulated audio devices in dosbox; but this might require an exception.

Your second option would get my vote, and I certainly agree that larger audio rendering chunks would be much more efficient.

dreamer commented 3 years ago

@realnc

When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer. Basically what the original fsynth dosbox patch did, except you'd keep its volume controllable by the dosbox mixer. This way, fsynth would behave similar to running it as a stand-alone client: It constantly renders audio and outputs to its own audio device handle, without blocking anything, and plays the MIDI events sent to it by dosbox.

We know, that's what we want to accomplish.

Ergo, we need to put synthesis into it's own thread (and let FS potentially split it into more threads internally, if user wants to). FS maintainer explicitly advised sythesis thread should use higher priority and not be responsible for other tasks (and I agree - some soundfonts use ridiculously big samples, and will need a lot of horsepower to process).

Another somewhat easy solution is to have fsynth only render audio in the audio thread. That would be MIXER_CallBack() in hardware/mixer.cpp, which is called by SDL in a different thread. (…)

Treating a single channel differently this way would be somewhat hacky solution in my opinion… But yeah, this is an option as well. I think putting FS into it's own thread will give a bit better results, but the proof is in the pudding as they say :)

There's one more solution - I don't want to use it, because it has multiple other drawbacks, but listing it just for the sake of having complete picture. We could replace DOSBox mixer with SDL2_mixer library - it has built-in support for MIDI streams via Timidity… But it seems like SDL2_mixer has some issues, and I really prefer FluidSynth 2.x over Timidity (because it's maintained and AFAIK it's the only synth supporting the full set of SF2 features).

realnc commented 3 years ago

We could replace DOSBox mixer with SDL2_mixer

That won't work. SDL_mixer can play MIDI files but not MIDI events. Also, SDL_mixer isn't actually intended for mixing audio. It's for decoding and playing various audio files (like WAV, Vorbis, etc.) The "mixer" in its name is somewhat unfortunate.

grapeli commented 3 years ago

Then launch the game with it.

#!/bin/bash
set -xeu

# Ensure we're running inside the script directory
cd "$(dirname "$0")"

# Find and preload the data files containing latency-sensitive
# MIDI data. The pattern gets both the HOMM2 original campaign
# and expansion pack.
find heroes2/data -iname 'hero*.agg' \
 | xargs -i dd if="{}" of=/dev/null bs=1M

# Launch dosbox
/path/to/dosbox-binary -conf -userconf dosbox.conf 

@kcgen It reduces to about 4-6. There are also two files in the startup sequence that are open for writing and are modified (heroes2.cfg). Autosave.gm1 (106 kB), after opening and each time you click the hourglass (next day).

The 8GB flash drive (ext4) also fails this test (probably due to slow writing).

Regarding the configuration you proposed, I had to slightly modify it. The laptop, in addition to a weak HDD and CPU i5-450M, also has poor intel integrated graphics (Gen5) compatible only with OpenGL 2.1. output = texture prevents scaling.

[sdl]
output = opengl
windowresolution = 1366x768

[cpu]
cycles = 60000

Here's a recording. homm2.flac.txt Likewise, most problems are at the beginning (when the I/O is loaded), then it's clean.

To be sure, I repeated this test in various configurations on tmpfs eight times. It is always 100% correct.

edit: I put autosave.gm1 and heroes2.cfg on tmps and made symbolic links. Even with that, it's not 100% correct.

realnc commented 3 years ago

Treating a single channel differently this way would be somewhat hacky solution in my opinion… But yeah, this is an option as well. I think putting FS into it's own thread will give a bit better results, but the proof is in the pudding as they say :)

I completely forgot: fsynth wouldn't be the exception. Munt should be treated the same way. Or any other MIDI synth, like BASSMIDI (if you decide to add that at some point.)

This makes sense even from a philosophical point of view. MIDI synths were always "asynchronous" devices back in the day. If you had an MT-32, a SoundCanvas, a Wave Blaster, or whatever else, the game would just send MIDI events over the MPU-401 port. The sound was rendered and played by those devices on their own. Even if the game went into an infinite loop, freezing completely, they would not stutter, hiccup, underrun, or anything like that. They would always play the sound cleanly, without any sound dropouts. Replicating that 100% async behavior makes sense.

kcgen commented 3 years ago

@grapeli,

Maybe there are other files (besides the big heroes*.agg) that load a bit of functional code prior to playing the MIDI sequences.

Other thoughts: If your system is starved for memory (or if many applications are idle and soaking up memory or swap), then there will be a lot of roll-over eviction where earlier read data is quickly evicted to make way for newer reads (even if we're only talking about 250MB of game data). Memory starved systems also can behave very poorly when it comes to swapping out active application memory to disk; paging it back in will hard-block the application. There are lots of knobs to tune this behavior, but at this point it's no longer related to FluidSynth - so perhaps we could chat over Discord via PM.

Regarding slow CPU and GPU -- given the game plays flawless when moved to tmpfs, I think your CPU and GPU are fine. dosbox doesn't need much horse-power to hit 30000 cycles (my $30 raspberry pi plays HOMM2 flawlessly w/ MIDI at 30k cycles); so I think the problem is isolated to your system's memory and IO subsystems.