Closed dreamer closed 3 years ago
Well laid-out plan @dreamer ! :bookmark_tabs:
Regarding part of point 4, I assume you mean using separate threads for midi synthesis. Fluidsynth already uses other threads internally, meaning that it doesn't burden the dosbox thread.
In general, the patch seems simple enough to me that it can just be merged. Further cleanup can proceed later on as a specific task. Most importantly, it'll allow more users to switch to dosbox-staging, who will then be invested in the project's success.
Fluidsynth already uses other threads internally, meaning that it doesn't burden the dosbox thread.
We need proof to confirm this assertion. It's not hard, just generating flamegraphs and some testing - but that's part of development work needed to push the patch through the finish line.
Also, I am not going to merge patch in a state where CI does not verify the build using multiple compilers on multiple OSes. And that's the state the patch is in ATM :(.
We need proof to confirm this assertion. It's not hard, just generating flamegraphs and some testing - but that's part of development work needed to push the patch through the finish line.
From the fluidsynth pages:
As soon as the audio driver is created, it will start playing. The audio driver creates a separate thread that uses the synthesizer object to generate the audio.
This is unlike munt, which has no notion of threading.
Most importantly, it'll allow more users to switch to dosbox-staging
Great point @bluddy , I agree. Regarding the flamegraphs @dreamer mentioned, are you able to generate them?
I haven't had time to dig in yet; but if you can build and run fluidsynth (standalone) on a linux box, here are notes on how to generate them: https://github.com/brendangregg/FlameGraph/blob/master/README.md
Without going into details, this is the way I generate flamegraphs (Linux only), before starting: the FlameGraph repo mentioned earlier needs to be cloned (for the scripts inside).
-g -O0 -fno-omit-frame-pointer
sudo perf record -F 99 -p <PID> -g -- sleep 120
-F 99 is probing stack at 99Hz - empirically, this seems like a good middle-ground, at least for me (we don't want too high value for probing, as it makes the results less realistic!); play the game or watch benchmark for 2 minutessudo perf script | ~/src/FlameGraph/stackcollapse-perf.pl > out.perf-folded
~/src/FlameGraph/flamegraph.pl out.perf-folded > flamegraph-game-test-description.svg
Resulting SVG to be opened in the browser (it has js to allow for easier browsing and filtering the graph). I need to describe somewhere how to interpret the graph, but overall: there will definitely be a huge plateau of unrecognized stacks (that's dynrec-generated CPU emulation, we're not interested in that) and a tower, that can be clearly recognized as main dosbox "Normal" loop; narrow towers on top of normal loop are ok - plateaus on top of normal loop are bottlenecks. (The thing I'm not sure about is how to be 100% sure flamegraph does not include child threads - but for CD-DA this doesn't seem to be a problem).
If flamegraphs with FluidSynth integrated will look similar to graphs when game is playing music via CD-DA emulation - that's good. if fluidsynth will show up as plateaus covering ~5% of runtime stacks or more - that's bad and we'll need to investigate further.
edit I guess it would be helpful to list the games, that offer music playback both via CD-DA and MIDI to start preparing test cases. I think some candidates might be: HoMM2, Settlers 2, and System Shock.
Instructions are great! Here's a zoom out of what I got (Jones in the Fast Lane, ~2500 cycles, FLAC CD-DA sequences back-to-back)
Wow, this looks completely different than what I get on x86_64 - can you paste plain svg somewhere?
Work on this feature started on branch po/fluid-1
. Old version of this patch as distributed via ECE has some issues - we are going to use @realnc implementation instead (link) - it seems cleaner, was already converted to FluidSynth 2.1, and seems to be better tested. But we will need to do some tweaks anyway, as certain small design choices in there clash with our future plans.
A recent platform-specific timer adjustment for fluidsynth: https://github.com/joncampbell123/dosbox-x/commit/e00cf22392f9c0eb9bef3e3e8edea4fa433dc609
A recent platform-specific timer adjustment for fluidsynth: joncampbell123/dosbox-x@e00cf22
These are only used when fluidsynth is doing audio output itself. They are audio driver parameters. When rendering audio into a buffer (with fluid_synth_write_s16()
in this case) and letting the dosbox mixer play the audio without creating a fluidsynth audio driver, these parameters have no effect whatsoever.
http://www.fluidsynth.org/api/index.html#UsingSynth
This is how it's done in dosbox-core. The fluidsynth patch that's been floating around for a while now for vanilla dosbox does not do this, and thus there these parameters are important.
Thanks for the comparison @realnc, and good to know these adjustments won't be needed.
The approach of feeding dosbox's mixer with samples is win-win-win: fewer LOC to maintain, uses a single host-agnostic audio interface abstracted by SDL, and less runtime complexity and overhead.
Initial, working version of FluidSynth integration can be tested on branch po/fluid-3
- at this point it is (almost) direct port from dosbox-core, but using our normal coding conventions, licensing info and SPDX identifier. Code was also moved to the recently created midi
module, to avoid littering gui
any more.
Testers: you need to compile it yourself. Our CI does not provide precompiled snapshots with FluidSynth integration (yet). FluidSynth 2.x is available in many distro repositories, but it's still missing from a few notable ones.
Do not ask me for support as of yet - you're on your own. Do not get married to new fluid settings as inherited from dosbox-core - we will change them (not sure exactly how yet).
I tested it on Ubuntu 20.04 and Windows 10 and it seemed to work fine, but code is not good enough quality-wise and we have no user documentation, so it won't be merged to master just yet.
The first part of FluidSynth 2.x support was just merged via #539 :)
But I'm not closing this feature request just yet - we need to polish it a little bit, add more documentation, implement some missing bits, we have 1 small bug… but as of now, dosbox-staging finally has a built-in MIDI synth.
To testers: recreate your config file - there's a new fluidsynth
section in. The current set of user-changeable settings is not final.
I am especially interested in learning from testers using wide range of SoundFonts:
If you're compiling the code yourself, then FluidSynth support should work on any OS. If you're using our pre-compiled snapshot builds, then ATM only Windows builds have the feature enabled (fluidsynth 2.x library is missing from brew repo on macOS and from Ubuntu 18.04 repos, so we cannot provide pre-compiled packages on those OSes yet).
i think this is from SVN source dosbox: https://launchpad.net/~i30817/+archive/ubuntu/dosbox-patched
##
# fluid.driver: Driver to use with Fluidsynth, not needed under Windows. Available drivers depend on what Fluidsynth was compiled with
# Possible values: pulseaudio, alsa, oss, coreaudio, dsound, portaudio, sndman, jack, file, default.
# fluid.soundfont: Soundfont to use with Fluidsynth. One must be specified.
# fluid.samplerate: Sample rate to use with Fluidsynth.
# fluid.gain: Fluidsynth gain.
# fluid.polyphony: Fluidsynth polyphony.
# fluid.cores: Fluidsynth CPU cores to use, default.
# fluid.periods: Fluidsynth periods.
# fluid.periodsize: Fluidsynth period size.
# fluid.reverb: Fluidsynth use reverb.
# Possible values: no, yes.
# fluid.chorus: Fluidsynth use chorus.
# Possible values: no, yes.
# fluid.reverb,roomsize: Fluidsynth reverb room size.
# fluid.reverb.damping: Fluidsynth reverb damping.
# fluid.reverb.width: Fluidsynth reverb width. (.76)
# fluid.reverb.level: Fluidsynth reverb level. (.57)
# fluid.chorus.number: Fluidsynth chorus voices
# fluid.chorus.level: Fluidsynth chorus level.
# fluid.chorus.speed: Fluidsynth chorus speed.
# fluid.chorus.depth: Fluidsynth chorus depth.
# fluid.chorus.type: Fluidsynth chorus type. 0 is sine wave, 1 is triangle wave.
# Possible values: 0, 1.
i use a Roland SC-55 soundfont, i cant remember where i got it from FluidR3_GM_sc-55.sf2 108424522 bytes
i dont have fluidsynth running all the time i call it to start using a script then whatever application requires it when the application exits so does fluidsynth. ie:
/usr/bin/fluidsynth -a pulseaudio -m alsa_seq -i -l -s -p FluidSynth /usr/share/sounds/sf2/FluidR3_GM.sf2 &
/usr/games/dosbox &&
that said, i dont start fluidsynth with dosbox-staging ive only done minor testing with dosbox-staging, sounds seem to be fine there were a few games which music was not playing. i will try to figure out which games do/dont play both by starting fluidsynth and by using static built-in fluidsynth
edit: my mistake, im not using the fluidsynth patch. disreguard my testing
some dos midi software useful for testing? some midi players could be useful for exclusively testing midi (no extra game/sfx/etc, and lightweight)
http://dosmid.sourceforge.net/
also GSPLAY 1.0 version gsplay1.zip or labelled 1.1 i dont see any public sources, although many respositories list it as "freeware" so i wont post a link here not the same as gsplay v2.x free which is for windows
untested but others report using: megamid dos
With @grapeli's report of playback gaps on slightly older hardware, I wondered if the integrated FS library really is as threaded as an external midi player is when processing sysex calls entirely out of band relative to dosbox.
The documentation http://www.fluidsynth.org/api/ mentions:
FluidSynth's rendering engine is implemented by using the "Dispatcher Thread Pattern". This means that a certain thread A, which calls one of FluidSynth's rendering functions, namely
fluid_synth_process() fluid_synth_nwrite_float() fluid_synth_write_float() fluid_synth_write_s16() automatically becomes the "synthesis thread". The terms "synthesis context" and "synthesis thread" are equivalent
So.. we have Dosbox's 1ms loop calling FS's above-mentioned rendering function, and this means the synthesis is actually be done in Dosbox's main thread.
That was true for 2.1.5, which is the latest. The docs mentions for 2.2.0:
The sequencer has received a major revisal. For you that means: The sequencer's queue no longer blocks the synthesizer thread, due to being busy arranging its events internally.
So I guess some underlying work, namely this "sequencer queue that arranges events" is now being broken out of the rendering call.. so hopefully that further reduces the block-time when synthesizing the audio.
I suspect to fully disconnect FS from blocking us, we'd need to put the FS synth object inside a thread, and front-run its rendering by some number of milliseconds. So when Dosbox's 1ms loop comes around, the samples are 100% ready to be written into the audio channel, and if they're not ready then we block until they are let the user know that FS couldn't keep up: so either add more FS threads and/or increase the pre-render latency (which might be a new conf options specifying some number of milliseconds.. or we could derive it from the user's [mixer] prebuffer ms
setting).
So just as we suspected in the beginning (point 4 in my original post)… :( Also, notably, 2.2.0 is ABI incompatible with previous versions… we will need to keep an eye on this.
[fluidsynth]
synth_threads = 1
# ps -T -l -C dosbox
F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 R 1000 31240 31240 1368 24 89 0 - 195520 - pts/6 00:00:04 dosbox
1 S 1000 31240 31241 1368 0 107 19 - 195520 - pts/6 00:00:00 dosbox:disk$0
1 S 1000 31240 31242 1368 0 107 19 - 195520 - pts/6 00:00:00 dosbox:disk$1
1 S 1000 31240 31243 1368 0 106 19 - 195520 - pts/6 00:00:00 dosbox:disk$2
1 S 1000 31240 31244 1368 0 107 19 - 195520 - pts/6 00:00:00 dosbox:disk$3
1 S 1000 31240 31245 1368 0 99 19 - 195520 - pts/6 00:00:00 SDLHotplugALSA
1 S 1000 31240 31246 1368 1 80 0 - 195520 - pts/6 00:00:00 SDLAudioP2
[fluidsynth]
synth_threads = 2
# ps -T -l -C dosbox
F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 31194 31194 1368 45 80 0 - 214010 - pts/6 00:01:01 dosbox
1 S 1000 31194 31195 1368 0 106 19 - 214010 - pts/6 00:00:00 dosbox:disk$0
1 S 1000 31194 31196 1368 0 106 19 - 214010 - pts/6 00:00:00 dosbox:disk$1
1 S 1000 31194 31197 1368 0 106 19 - 214010 - pts/6 00:00:00 dosbox:disk$2
1 S 1000 31194 31198 1368 0 106 19 - 214010 - pts/6 00:00:00 dosbox:disk$3
1 S 1000 31194 31199 1368 0 99 19 - 214010 - pts/6 00:00:00 SDLHotplugALSA
1 S 1000 31194 31200 1368 1 80 0 - 214010 - pts/6 00:00:01 SDLAudioP2
1 S 1000 31194 31201 1368 4 -1 - - 214010 - pts/6 00:00:06 mixer0
In this case, the priority of the mixer thread in htop is -61
(very high, the same is also external fluidsynth).
@grapeli , thanks :👍
Assuming we will have other users in this boat, please share your recommended config changes that work better than our current defaults (no rush; once you're done investigating).
If these also work for others without detriment, then I suggest we go with your settings as safe/better values while we address the remaining issues with FluidSynth (such as putting it in parallel thread instead of blocking our main loop).
Below comment from @grapeli moved from PR https://github.com/dosbox-staging/dosbox-staging/pull/640#issuecomment-706623816 to this FluidSynth discussion thread.
My problems with the built-in fluidsynth are too slow access to data on the disk (hdd). Which can cause the sound hiccups generated by the built-in fluidsynth. Changes to the mixer settings do not affect this (this is my completely misinterpretation). The real reason for smooth audio is that the second, third, fifth time the data is cached or completely in memory (tmpfs).
CPU is not a problem, although in HoMM2 there may be a load spike close to this limit. I built an optimized (profiled) version of this branch, with interpolation, chorus and reverb turned off (the sound is clearly lower quality), but the load is also lower.
Interesting @grapeli . So when you move your SF2 to /dev/shm (or tmpfs), the problem disappears?
In this case @dreamer, we should consider a follow on PR that reads the entire SF2 into memory. Something like: http://www.fluidsynth.org/api/fluidsynth_sfload_mem_8c-example.html
However, this solution is pretty ugly. Ideally FluidSynth would offer an option to read the entire file into memory on load to eliminate poor IO latency from DoS'ing the stream. I will open and issue and see what their developers think.
CPU is not a problem, although in HoMM2 there may be a load spike close to this limit.
Yes - the two are often connected (plus Linux reports blocked IO as CPU user time too :sweat_smile:). This was also happening to be on the Pi (I think we had a discussion about this); and on there I had to tell Timidity to load the entire SF2 into RAM, and I inculded a delay before launching DOSBox to account for this pre-load duration. Otherwise I would lose several seconds of MIDI during the Sierra logo startup sequence.
We definitely need to find the equivalent solution with FluidSynth.
I built an optimized (profiled) version of this branch, with interpolation, chorus and reverb turned off (the sound is clearly lower quality), but the load is also lower.
Thanks for these tests. Just to be absolutely clear, can you confirm that you have zero-hiccups when the SF2 is loaded from /dev/shm/path/ (RAM-disk), even with chorus, reverb, and 7th order poly all enabled? If you confirm this, then that would confirm that CPU-load-wise we're are good and that IO is exclusively the problem (even on older hardware).
@grapeli , see: https://github.com/FluidSynth/fluidsynth/issues/685
Interesting @grapeli . So when you move your SF2 to /dev/shm (or tmpfs), the problem disappears?
I ran another test.
Soundfont on hdd disk and mounted one directory with DOS programs (~2500 files) including HoMM2 from hdd. 01-homm2.flac.txt Approximately 19 audio interruptions.
Soundfont on hdd, HoMM2 mounted as a separate directory from hdd. 02-homm2.flac.txt Three breaks, only in the first 10 seconds.
Soundfont on hdd, HoMM2 mounted as a separate directory from tmpfs. 03-homm2.flac.txt Zero interruptions in audio delivery.
Soundfont (500MB, SGM-v2.01-Nice-Piano-Guit-Bass-v2.4.sf2) on hdd, HoMM2 mounted as a separate directory from tmpfs. Zero interruptions in audio delivery.
Each test is preceded by cleaning the cache.
echo 3 | sudo tee /proc/sys/vm/drop_caches
OK, so as for now we know:
synth.dynamic-sample-loading
to default (disabled). We discussed enabling this feature previously, but this behaviour indicates we should probably keep it as disabled. Or maybe even explicitly set it to false to indicate it's disabled by design.@kcgen I'll wait with starting work on (3) until SoftLimiter work will be finished to avoid conflicts.
- FluidSynth loads whole soundfont into the memory, processes it and closes the handle. It does that because we keep
synth.dynamic-sample-loading
to default (disabled). We discussed enabling this feature previously, but this behaviour indicates we should probably keep it as disabled. Or maybe even explicitly set it to false to indicate it's disabled by design.
Just one more quick comment regarding the dynamic-sample-loading: it might be worthwhile to give your users the option to choose dynamic loading vs. static loading of the samples. Especially when using many or large Soundfonts, enabling it will help to save possibly hundreds of megabytes of RAM in addition to reducing the load time of the Soundfonts significantly. The big downside is IO in the render thread, of course. But my guess is that you are using FS only for music playback, so you could probably get away with a large buffer and latency in the 100+ ms range. And with that, dynamic sample loading will probably only be problematic on really old hardware or very slow HDDs...
Anyway, I hope you get FS integrated into dosbox without problems. In case you need more input, the fluid-dev mailing-list is the best place for general feedback and usage questions.
@grapeli,
Your tests match with what the FluidSynth team explaned (that FluidSynth by default reads the entire SF2 into memory), which explains why there is no stuttering in your tests 3 and 4.
I was able to measure this load-in-full behavior using the ~517 MiB SGM soundfont:
sudo pmap <dosbox-staging PID>
shows a single heap allocation holding the soundfontdstat
shows the disk reads per second accumating to ~519 MiB, after which the dosbox shell appears@dreamer, I agree with your 2nd point above because an IO stall inside dosbox's emulation loop (in this case, reading a HOMM2 data file as shown in @grapeli's tests 1 and 2), will block dosbox from gathering audio from its channel sources; because dosbox employs one big serialized loop.
To expand on it: even if we thread FS's synth call, dosbox's channel callback will still be operating in dosbox's main loop (that is, the callback that copies the buffer produced by synth(..)
into the mixer via add_samples_(...)
). Likewise, once all the callbacks have fed dosbox's mixer, it combines them into a final mixed audio buffer and writes that out to SDL, which will also be blocked because the mixer is part of the serial loop.
So the mixer, its callbacks (and something to drive them on a timer), and FluidSynth all need to be threaded if we want SDL to keep receiving audio samples (and thus keep hearing MIDI music) while dosbox's primary loop gets blocked on game IO stalls.
This is going to be large architectural overhaul where all of the emulated sound devices would become threaded. I'm not sure how it will work, given many of those sound devices require the DOS program to be poking and prodding their registered ports and IRQs, which influence generated audio.
@grapeli,
Your tests match with what the FluidSynth team explaned (that FluidSynth by default reads the entire SF2 into memory), which explains why there is no stuttering in your tests 3 and 4.
I was able to measure this load-in-full behavior using the ~517 MiB SGM soundfont:
* **Left pane**: `sudo pmap <dosbox-staging PID>` shows a single heap allocation holding the soundfont * **Right pane**: `dstat` shows the disk reads per second accumating to ~519 MiB, after which the dosbox shell appears
@dreamer, I agree with your 2nd point above because an IO stall inside dosbox's emulation loop (in this case, reading a HOMM2 data file as shown in @grapeli's tests 1 and 2), will block dosbox from gathering audio from its channel sources; because dosbox employs one big serialized loop.
To expand on it: even if we thread FS's synth call, dosbox's channel callback will still be operating in dosbox's main loop (that is, the callback that copies the buffer produced by
synth(..)
into the mixer viaadd_samples_(...)
). Likewise, once all the callbacks have fed dosbox's mixer, it combines them into a final mixed audio buffer and writes that out to SDL, which will also be blocked because the mixer is part of the serial loop.So the mixer, its callbacks (and something to drive them on a timer), and FluidSynth all need to be threaded if we want SDL to keep receiving audio samples (and thus keep hearing MIDI music) while dosbox's primary loop gets blocked on game IO stalls.
This is going to be large architectural overhaul where all of the emulated sound devices would become threaded. I'm not sure how it will work, given many of those sound devices require the DOS program to be poking and prodding their registered ports and IRQs, which influence generated audio.
@kcgen So that architectural overhaul is for after 0.76?
@MasterO2 - I'm not sure.
Moving FluidSynth to its own thread could happen in time for 0.76 (@dreamer's second point). But I don't think that will solve the issue.
I'm also very worried that threading dosbox's entire audio subsystem will result in broken audio for devices that involves lock-step port & IRQ control (ie: Sound Blaster, Adlib, and GUS); so we'd need a hybrid approach where only the relatively stand-alone audio sources (such as MIDI and CDDA) are broken out with a separate threaded path out SDL; and I personally don't have the stomach for that amount of work (at least at this point), so I can't vouch for a timeframe.
That said, there are other zero-risk approaches that might help @grapeli :-) Working on that right now.
@kcgen I more or less understand the cause of the problem.
I will add that I did one more test (four times) with the external fluidsynth for the most demanding first case. The result for four repetitions is 100% correct.
I checked what files are read during the test.
inotifywait -m -e access -e open -r HoMM2/ 2>&1 | tee /tmp/hero2.log
Mostly this one DATA/HEROES2.AGG (~41.5 MB).
grep -c HEROES2\.AGG /tmp/hero2.log
2896
I don't know how big chunks it reads this data.
A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.
@MasterO2 Yes, after 0.76.0; and I would really, really prefer to introduce Rust before we'll start introducing multithreading in all places we need it.
@kcgen Yes, I agree that making mixer multi-threaded (perhaps even with each channel operating in it's own thread) is a long-term goal, but we don't need it to have somewhat-usable FluidSynth support. Right now mixer callback is blocked for the whole time of FS synthesis; after we'll move it to separate thread it will be blocked only waiting for FS to finish the job and consume the buffer (synthesis will be able to start as soon as sysex commands will start arriving.
So we'll have usable FS MIDI, probably usable to the same level as emulating midi via GUS ULTRAMID is right now (which also results in radically reducing our emulation speed, but rarely anyone complains about it).
A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.
@grapeli What? How? You need to configure sequencer port the same way as it always worked.
@grapeli,
We certainly don't want you having to copy your games to tmpfs just to play them. Fortunately Linux gives us some knobs to achieve a "full preload on demand" behavior to help hide the impact of high-latency IO.
Here's how I converted my HOMM2 installation and adjusted my system:
.bin
file without visibility into the ISO9660 files within.heroes2
game directory into one.
This further collapses roughly ~240 MiB worth of duplicate files and lets the read-ahead work more efficiently (ie: instead of reading the same file from CDROM and then local game directory, it just reads it once).e4defrag .
echo 49152 | sudo tee /sys/block/sda/queue/read_ahead_kb
.
This ensures that the entire HEROES2.AGG
will be moved into data-cache the very first byte that's read (which is right when you depart the castle).sudo blockdev --setra 49152 /dev/sda
The good news about both the file-system and block read-ahead is that they operate asynchronously beyond the original read request. So when HOMM2 asks for a couple bytes from the AGG file, those bytes are returned immediately as soon as they're read - meanwhile the block and filesystem read-ahead carry on in parallel. So it's the best of both worlds.
Let me know when you get this:
cd homm2d
dosbox.conf
and updated the soundfont path to your ownhomm2d
to ensure all local dosbox.conf settings are applied.[autoexec]
section will take care of launching the game.A final note, dosbox-staging with built-in fluidsynth prevents from using external fluidsynth.
@grapeli What? How? You need to configure sequencer port the same way as it always worked.
[midi]
mididevice = alsa
midiconfig = 128:0
It works. Oops. I set it to default.
@kcgen Today I don't have time for further analysis. I ran all tests on XFS. That's a decent fs. He is not to blame. The hard drive model is to blame - WDC WD10SPZX-24Z.
For anyone reading previous @kcgen's post: of course, this is only a temporary solution - we know users outside of Linux don't have sofisticated options of configuring readahead or tweaking filesystems.
Our goal is to have built-in FS usable on Windows and macOS as well. BTW, users who want to play HoMM2 - this game provides CD audio, so MIDI is not absolutely necessary; but it is very good stress-test.
@dreamer, we might not be able to solve high latency storage problems unless we're committed to adding our own embedded read-ahead mechanism. That said, we can do the best we can inside the emulator (threading, etc) plus helping users adjust their operating system settings when it makes sense.
That said, NAND storage will pass HDDs in price/GB soon enough - so this problem might vanish with time.
@kcgen Today I don't have time for further analysis.
No worries.
I ran all tests on XFS.
Great.
find . -type f | sudo xargs xfs_fsr -v
When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer. Basically what the original fsynth dosbox patch did, except you'd keep its volume controllable by the dosbox mixer. This way, fsynth would behave similar to running it as a stand-alone client: It constantly renders audio and outputs to its own audio device handle, without blocking anything, and plays the MIDI events sent to it by dosbox.
@kcgen Today I defragmented the entire xfs filesystem. I will try to make even better tuning. I quickly checked under ext4. Not much better or the same (more repetitions required to be sure). I downloaded the test package.
Roger than @grapeli. Hopefully the two read-ahead settings make a notable difference.
When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer.
@realnc , my understanding is that sdl only holds a single instance for a given audio device. So, even though the mixer and (hypothetical) theaded-FS could each hold their own pointer to the same audio device, under the hood sdl is only playing audio for one global buffer.
In other words, you can't write two separate buffers of audio (threaded and overlapping in time), and expect sdl to mix them into a single stream.
This limitation drove the original need for Dosbox's mixer; otherwise each audio channel could have independently written its buffer straight into sdl and let it do the mixing.
(I know this first hand.. I tried to do so in one of the first versions of the CDDA patch 😅, which in turn drove me to using sdl_mixer-X, before deciding it was too heavy of a solution and fell back to using the dosbox mixer).
OpenAL on the other hand does allow many simultaneous inbound streams into the same audio device, and it performs the mixing for you. So this FS push might revive our exploration into using OpenAL (or something like it).
Crank up the per-file read-ahead just for the block device storing your games. Let's assume this is sda:
echo 49152 | sudo tee /sys/block/sda/queue/read_ahead_kb
.Crank up the block-device read-ahead, which means that adjacent files during read will also be pulled into data cache (which are now likely to be other HOMM2 game files, given we've defragged the directory)
sudo blockdev --setra 49152 /dev/sda
These two suggestions are quite good. It reduces the number of holes in the sound stream by half (first case), but more importantly they are located only in the first phase in the initial 20-25 seconds, the rest of the test is OK (before that, even in the final phase there were losses).
@grapeli , that's good news; those settings should mitigate the drawn-out latency impact of slow HDD's in exchange for "front-loading the latency-pain".
That said - we can do better, knowing that HOMM2's MIDI data is contained inside the data/heroes2.agg
file.
Save the following as start.sh
with execute-permissions along side the test package's dosbox.conf
.
Then launch the game with it.
#!/bin/bash
set -xeu
# Ensure we're running inside the script directory
cd "$(dirname "$0")"
# Find and preload the data files containing latency-sensitive
# MIDI data. The pattern gets both the HOMM2 original campaign
# and expansion pack.
find heroes2/data -iname 'hero*.agg' \
| xargs -i dd if="{}" of=/dev/null bs=1M
# Launch dosbox
/path/to/dosbox-binary -conf -userconf dosbox.conf
If this works, I can envision dosbox performing latency-sensitive preload via a list of files provided in the conf file. It wouldn't actually hold them in memory (which would be wasteful), and instead it would read and discard the data knowing that all modern operating system will have a copy of the data in their LRU cache.
my understanding is that sdl only holds a single instance for a given audio device. So, even though the mixer and (hypothetical) theaded-FS could each hold their own pointer to the same audio device, under the hood sdl is only playing audio for one global buffer.
Fluidsynth outputs audio directly through ALSA or PulseAudio (or whatever the equivalent is on other OSes.) SDL is an optional audio driver in fsynth, but it's not the default.
This is how the original fsynth patch does it. This results in two audio streams showing up in the OS mixer when running dosbox. The OS in this case is doing the mixing, just as if two different processes are using the same audio device. Also, it looks kinda weird when you get two OS mixer sliders for the same app. It's not the best solution, but it's easy to do.
Another somewhat easy solution is to have fsynth only render audio in the audio thread. That would be MIXER_CallBack()
in hardware/mixer.cpp
, which is called by SDL in a different thread. This could potentially also improve rendering perf as well, because that way fsynth will render larger chunks each time, maybe getting better CPU cache utilization. The audio thread isn't doing anything else other than mixing right now, so it should have enough CPU time to both render and mix.
Thanks @realnc. Yes, we wanted FluidSynth to use the same audio architecture as the other emulated audio devices in dosbox; but this might require an exception.
Your second option would get my vote, and I certainly agree that larger audio rendering chunks would be much more efficient.
@realnc
When you run fluidsynth in its own thread, you could have it output audio outside the dosbox mixer. Basically what the original fsynth dosbox patch did, except you'd keep its volume controllable by the dosbox mixer. This way, fsynth would behave similar to running it as a stand-alone client: It constantly renders audio and outputs to its own audio device handle, without blocking anything, and plays the MIDI events sent to it by dosbox.
We know, that's what we want to accomplish.
Ergo, we need to put synthesis into it's own thread (and let FS potentially split it into more threads internally, if user wants to). FS maintainer explicitly advised sythesis thread should use higher priority and not be responsible for other tasks (and I agree - some soundfonts use ridiculously big samples, and will need a lot of horsepower to process).
Another somewhat easy solution is to have fsynth only render audio in the audio thread. That would be MIXER_CallBack() in hardware/mixer.cpp, which is called by SDL in a different thread. (…)
Treating a single channel differently this way would be somewhat hacky solution in my opinion… But yeah, this is an option as well. I think putting FS into it's own thread will give a bit better results, but the proof is in the pudding as they say :)
There's one more solution - I don't want to use it, because it has multiple other drawbacks, but listing it just for the sake of having complete picture. We could replace DOSBox mixer with SDL2_mixer library - it has built-in support for MIDI streams via Timidity… But it seems like SDL2_mixer has some issues, and I really prefer FluidSynth 2.x over Timidity (because it's maintained and AFAIK it's the only synth supporting the full set of SF2 features).
We could replace DOSBox mixer with SDL2_mixer
That won't work. SDL_mixer can play MIDI files but not MIDI events. Also, SDL_mixer isn't actually intended for mixing audio. It's for decoding and playing various audio files (like WAV, Vorbis, etc.) The "mixer" in its name is somewhat unfortunate.
Then launch the game with it.
#!/bin/bash set -xeu # Ensure we're running inside the script directory cd "$(dirname "$0")" # Find and preload the data files containing latency-sensitive # MIDI data. The pattern gets both the HOMM2 original campaign # and expansion pack. find heroes2/data -iname 'hero*.agg' \ | xargs -i dd if="{}" of=/dev/null bs=1M # Launch dosbox /path/to/dosbox-binary -conf -userconf dosbox.conf
@kcgen It reduces to about 4-6. There are also two files in the startup sequence that are open for writing and are modified (heroes2.cfg). Autosave.gm1 (106 kB), after opening and each time you click the hourglass (next day).
The 8GB flash drive (ext4) also fails this test (probably due to slow writing).
Regarding the configuration you proposed, I had to slightly modify it. The laptop, in addition to a weak HDD and CPU i5-450M, also has poor intel integrated graphics (Gen5) compatible only with OpenGL 2.1.
output = texture
prevents scaling.
[sdl]
output = opengl
windowresolution = 1366x768
[cpu]
cycles = 60000
Here's a recording. homm2.flac.txt Likewise, most problems are at the beginning (when the I/O is loaded), then it's clean.
To be sure, I repeated this test in various configurations on tmpfs
eight times. It is always 100% correct.
edit: I put autosave.gm1 and heroes2.cfg on tmps and made symbolic links. Even with that, it's not 100% correct.
Treating a single channel differently this way would be somewhat hacky solution in my opinion… But yeah, this is an option as well. I think putting FS into it's own thread will give a bit better results, but the proof is in the pudding as they say :)
I completely forgot: fsynth wouldn't be the exception. Munt should be treated the same way. Or any other MIDI synth, like BASSMIDI (if you decide to add that at some point.)
This makes sense even from a philosophical point of view. MIDI synths were always "asynchronous" devices back in the day. If you had an MT-32, a SoundCanvas, a Wave Blaster, or whatever else, the game would just send MIDI events over the MPU-401 port. The sound was rendered and played by those devices on their own. Even if the game went into an infinite loop, freezing completely, they would not stutter, hiccup, underrun, or anything like that. They would always play the sound cleanly, without any sound dropouts. Replicating that 100% async behavior makes sense.
@grapeli,
Maybe there are other files (besides the big heroes*.agg
) that load a bit of functional code prior to playing the MIDI sequences.
.agg
files over to tmpfs?Other thoughts: If your system is starved for memory (or if many applications are idle and soaking up memory or swap), then there will be a lot of roll-over eviction where earlier read data is quickly evicted to make way for newer reads (even if we're only talking about 250MB of game data). Memory starved systems also can behave very poorly when it comes to swapping out active application memory to disk; paging it back in will hard-block the application. There are lots of knobs to tune this behavior, but at this point it's no longer related to FluidSynth - so perhaps we could chat over Discord via PM.
Regarding slow CPU and GPU -- given the game plays flawless when moved to tmpfs, I think your CPU and GPU are fine. dosbox doesn't need much horse-power to hit 30000 cycles (my $30 raspberry pi plays HOMM2 flawlessly w/ MIDI at 30k cycles); so I think the problem is isolated to your system's memory and IO subsystems.
The patch integrating FluidSynth was developed a long time ago (first revision 9 years ago, last revision 3 years ago - it really waits that long for inclusion upstream?). It is currently being distributed via ECE.
In the past, I was very hesitant to integrate FluidSynth patch for a number of reasons:
What changed?
Over last 1-2 months (?), point (5) stopped being a problem, fluidsynth 2.x is now available in most repositories, including brew and vcpkg. That takes out reason (5), and it's huge.
Points (4) and (2) are just a matter of putting in some work - not a big deal. I still want to address (1), but after looking in details at the patch - this work can largely happen in parallel. (3) can be assessed only during testing.
Of course, integration of Fluid needs to be optional - the lib is not old nor tested enough to be propagated to all repositories, so some users will want to build without it. But the patch already kind-of handles it.
When?
Testing effort required would push 0.75.0 release too far away, so I won't merge any new MIDI-related changes before 0.75.0 release (here are tasks planned ATM for 0.75.0).
But if someone has free cycles and wants to push this work forward: the patch is imported on a side branch already - cherry-pick f786b9fdc52df0e9b39ee27d640fb37ba7c369e6 and start working on fixes to problems I listed in (2) and (4) ;) If nobody will step up, I'll start work on this sometime after 0.75.0.