audetto / AppleWin

Apple II emulator for Linux
GNU General Public License v2.0
49 stars 12 forks source link

Request that audio latency is set to 0ms on Linux platforms #64

Closed xandark closed 1 year ago

xandark commented 2 years ago

I notice that qapple and sa2 have 200ms default audio latency values, which makes playing games not very satisfying. I've been watching this project for awhile and it was only recently that I tried changing the audio latency to 0 in qapple and have finally found that to work for me.

On the other hand, it's not obvious from the command line args to sa2 how to change the latency to 0ms.

Could both of these be set to 0 by default, at least on Linux platforms?

audetto commented 2 years ago

Try this branch: https://github.com/audetto/AppleWin/tree/audio2 and see if the buffer size in the audio settings helps.

I think there is something wrong in the whole emulation, but the adaptive nature of AppleWin audio generation makes it extremely hard to reason about.

Going in and out of enhanced speed has audio effects too.

If the slider does not work, try as well to change the value in source code: https://github.com/audetto/AppleWin/blob/d9fc4faac456b8d775179695165555ab0be108a3/source/frontends/sdl/sdirectsound.cpp#L48

And try as well to see which of these 2 lines works better, or they should both stay.

https://github.com/audetto/AppleWin/blob/d9fc4faac456b8d775179695165555ab0be108a3/source/frontends/sdl/main.cpp#L145-L151

audetto commented 2 years ago

I rebased it. See if these controls help at all:

image

trigger and target

each time the sdl buffer is below trigger ms, it will ensure it is target ms.

xandark commented 2 years ago

Okay, I've managed to build the audio2 branch and test sa2, which is pushing my git skills.

I have played with the sliders for trigger and target size, and I don't notice much of a difference, unless I set trigger very low or to zero. Then the audio begins to repeat, which makes sense.

When I set trigger to 400ms and target to 400ms, for example, I don't think I'm hearing much latency difference. I've played with different values for about 10 minutes, and it's hard for me to say, yes, I found the winning combo.

I'm using Aztec as my test. In the original Windows-based AppleWin, when I press W to walk the character and S to stop, the sound of the footsteps starts and stops right when you'd expect it. However with this Linux port, the walking sound starts about .5 - 1.0 seconds after the walking animation starts or stops.

In the Dear ImGui interface, which is really great btw, I see the new controls. Below them, there are columns which say Direct, SDL, and Total. I see that there is a normalized value for each of those columns for row Channel 1. What do the values mean? I'm trying to understand what I should be trying to optimize toward. Should I be looking for a low, stable Total value?

audetto commented 2 years ago

Direct, SDL and Total are in ms.

This is the amount of audio in 1) the AppleWin buffer 2) in the SDL queue 3) the total of the 2. This should be the latency.

Problem is that they are not stable and do not behave in a predictable way. AppleWin tries to autodetect how much is good and when to produce more or less data and I think this goes horribly wrong.

webspacecreations commented 2 years ago

Take a look at the audio-related comments in this maintained repository (not the same as apple2 on Github): git clone http://shamusworld.gotdns.org/git/apple2

In this codebase, the audio is running as a separate thread and the author includes a number of comments related to synchronization that suggests there may be something thread- or timing-related going on in SDL. I'm not 100% sure it's fully working in that codebase, but if there's a problem, it's not perceptible like it is in AppleWin.

If you're compiling on Pi, you may need to change the audio setting to desired.format = AUDIO_S16SYS; in audio.cpp ~line 72 for audio to work. Hope this provides some insight... the codebase is relatively small & readable.

audetto commented 2 years ago

I can try to use the callback and see if it helps.

But

I've compiled it and run on my Ubuntu Desktop:

as soon as it starts, any existing audio is negatively affected. This does not happen with my code.

webspacecreations commented 2 years ago

"existing audio" ... meaning audio playing from other software? That may be a choice of some of the initial audio configuration parameters. Lmk if you've made the above desired.format change and what specific problems you're having and I'll see if there's a simple fix.

I don't think it's an example of perfect audio fidelity, rather better synchronization between the audio channel and the emulator (I've seen audio problems of one kind or another in most A2 emulators). The AW codebase is pretty large and not something I'm in regularly enough to be able to point to direct relevancy, but between the code and what the author documented I hoped it might at least provide a seed of inspiration.

audetto commented 2 years ago

AUDIO_S16SYS works better, probably not a Pi-specific fix.

sh95014 commented 2 years ago

@audetto I recently switched the macOS port to use CoreAudio, and I get a noticeable (but I didn't quite measure) latency improvement. The main difference is that CoreAudio pulls audio data on a real-time thread when it wants (at which point I feed it whatever is in the DirectSoundGenerator) as opposed to the SDL_QueueAudio call. As far as I can tell, your SDL-based sdirectsound is not substantially slower at calling SDL_QueueAudio than the CoreAudio callback version, so the delay might be due to what happens inside SDL_QueueAudio.

I used the Mockingboard DEMO-Dual Sound Generators.dsk simple piano demo to test audio latency.

audetto commented 2 years ago

I start thinking that this is the problem. When I push in SDL I have no really idea how much to push (as opposed to QT). This interacts badly with AppleWin which will try to compensate if I get it wrong going faster or slower.

Did you need to lock? Can you share the link to the audio callback.

sh95014 commented 2 years ago

Did you need to lock?

Uhhh… 😳 yes, I would need to lock, probably among Lock, Unlock, and Read in dsound. Is that something you want to do upstream or should I just do it for macOS? (Nice catch, thanks!)

The code is in https://github.com/sh95014/AppleWin/blob/master/source/frontends/sdl/sdirectsound.cpp, specifically DirectSoundRenderProc.

sh95014 commented 2 years ago

This should do it: lock.diff.zip

Let me know if you want to incorporate in your tree or whether I should apply it only to mine. Thanks!

audetto commented 2 years ago

Thank you. I will probably incorporate, but this will have to wait.

I will only be back at the end of August...

redenvelope2000 commented 2 years ago

Sorry this has nothing to do with the linux port. I've been working on your libretro core. I found the audio sync issue is easier on that because once we manage to generate number of samples regularly, say 735 per frame, the audio is sync'd internally in the RetroArch emulation code. The point is that the AW speaker code generates samples in a throttling manner, which must be disabled. After that the code works very well. I also added floppy drive sounds and got the mockingboard working on it.

audetto commented 2 years ago

The point is that the AW speaker code generates samples in a throttling manner, which must be disabled. After that the code works very well.

Yes, that makes everything more complicated.

I also added floppy drive sounds and got the mockingboard working on it.

If you upstream to AppleWin, it will be available everywhere.

webspacecreations commented 1 year ago

Resolving the audio latency issues for AppleWin on Linux would be a major improvement. Just checking where this is currently at... have improvements been upstreamed to AppleWin? It sounds like @redenvelope2000 has some major enhancements, hope they get incorporated.

redenvelope2000 commented 1 year ago

What I added is a regulator between myBuffer->Read() and mixBuffer() such that no matter how many samples the AW speaker code generates last frame, they are resized to 735 samples so there can always be the same number 735 samples sending to the mixBuffer(). I wanted to change the AW speaker code to have the same effect, but till now the working code are still in the frontends folder. During debugging I'd seen the speaker code generates "a little more than expected" samples which convinced me if a regulator was not there the R/W pointers of the audio buffer would eventually overwrite each other after a period of time. It is a mystery to me how the Windows build handles that.

redenvelope2000 commented 1 year ago

Answer to myself.

The audio R/W pointers in the Windows build do not overwrite each other. The W pointer of the audio buffer is initialized to the 3/8 position in the audio buffer, which gives a 0.1392s initial latency. The Spkr_SubmitWaveBuffer() code tries hard to keep the distance of R/W pointers between 1/4 and 1/2 of the buffer, that is, between 0.09s...0.18s by adjusting the frequency of the simulated 6502 CPU using g_nCpuCyclesFeedback. This global variable can be set to +-20 g_fClksPerSpkrSample to have the CPU to generate 20 more or less than usual samples to correct the error. I modified the Windows build to run by 1/60s frames to observe the behavior. For normal frames, 739 or 740 samples were produced, remembered we need 735 per frame? That's 4 or 5 more than needed. The correction occurred around 13 times in one whole second, 20 13=260 samples were reduced from usual audio generation. Equivalently, 260/60=4.333 samples were removed from each frame. The conclusion is that over-generation of samples does happen and the Windows build can correct that.

The error actually comes from the setting of g_fClksPerSpkrSample in SetClksPerSpkrSample(). It is 23, an integral number but it really should be 23.14. It is said in the comment of SetClksPerSpkrSample() that this CPU clocks per audio sample value was rounded for better sounds. However it also brings the error-- 0.14/23.14*44100/60 = 4.446. 4.446 more audio samples have been produced for every video frame for every AppleWin builds. Because it is well within the correction capability nobody have complained that so far. Though our SDL build does not use g_nCpuCyclesFeedback, I don't think simply adding it back can fix it because the correction mechanism of the Spkr_SubmitWaveBuffer() depends on lpDSBvoice->GetCurrentPosition() to return the precise number of consumed samples which the SDL_QueueAudio() does not.

Modifying SetClksPerSpkrSample() to accurately initialize g_fClksPerSpkrSample without rounding should give some improvement. It could be even better to implement a customized ring buffer by using SDLAudioCallback() as specified in https://davidgow.net/handmadepenguin/ch8.html. This fits the working model of the Windows direct sound code much better and makes the g_nCpuCyclesFeedback meaningful.

webspacecreations commented 1 year ago

@redenvelope2000 this looks like a highly detailed and actionable analysis. I definitely understand why @audetto would want this considered by the core project as the audio latency issue is pretty severe in SDL. Maybe a first effort would be submitting this analysis as inline comments within the core AppleWin repository for review by @tomcw @sicklittlemonkey and others. At the very least it seems like the underlying audio subsystem could use some demystifying / documentation... and maybe a project advocate for resolving this to facilitate better cross-platform support.

audetto commented 1 year ago

I have done some work on the audio generation and results look promising.

https://github.com/audetto/AppleWin/tree/audio_callback

Major changes are

  1. use a callback for SDL audio
  2. add option --sdl-audio-buffer to set the SDL buffer size (in ms, default = 46)
  3. apply g_nCpuCyclesFeedback when running in --fixed-speed

In the settings tab there is

image

the current value of the AW audio buffer (more reliable than before) and the number of underruns.

@sh95014 I really would like to have your opinion on these changes. And on the best way to apply g_nCpuCyclesFeedback, which is currently applied once per frame

https://github.com/audetto/AppleWin/blob/0ed406d81eaafa3e0c6ead4b2a5b9b65e1c4dd84/source/frontends/common2/speed.cpp#L24

but I wonder if it should go here instead

https://github.com/audetto/AppleWin/blob/0ed406d81eaafa3e0c6ead4b2a5b9b65e1c4dd84/source/frontends/sdl/sdlframe.cpp#L584

Moreover, the maximum adjustment allowed by default is 200 samples. AppleWin runs in chunks of 44 samples (1 ms), so 200 is a lot of freedom. Here I run 735 samples (60FPS), so the freedom is less, but still almost 30% of the speed. I am not sure how to test all of this.

sh95014 commented 1 year ago

(@audetto, sorry for the slow response, was on vacation.)

Unfortunately I don't have anything all that intelligent to add, but your audio_callback branch looks about what I'd expect. I don't fully understand g_nCpuCyclesFeedback, but it would seem like you'd want it in common2 than in sdl specifically? (Doesn't really matter to macOS in the short term because despite its name I'm still using sdlframe.cpp anyway.)

audetto commented 1 year ago

The error actually comes from the setting of g_fClksPerSpkrSample in SetClksPerSpkrSample(). It is 23, an integral number but it really should be 23.14. It is said in the comment of SetClksPerSpkrSample() that this CPU clocks per audio sample value was

You are right. This is a symptom of the problem and I think the solution is to do like the Windows version. Moreover, AppleWin (Windows) does not run at NTSC or PAL speed, but effectively at 23 * 44100 Hz, because as you say, the feedback is constantly slowing down the emulator. It took me ages to unravel this.

In Linux I was trying to be hit the exact speed, but this is bad for audio. I will soon push to https://github.com/audetto/AppleWin/tree/audio_callback a final fix to handle the feedback.

I think that if one targets 23 * 44100 directly, the need for a correction is vastly removed, although it is still useful to compensate smaller / random issues.

audetto commented 1 year ago

Look at https://github.com/audetto/AppleWin/pull/87

You can try to further reduce the SDL audio buffer

--audio-buffer 46 is the default.

xandark commented 1 year ago

Okay very good. I've recompiled the latest code and I find that running on the SDL-based sa2 has much lower audio latency when playing Aztec. However, qapple has a noticable audio latency when I stop the game character from walking. I also notice that qapple spams a lot of debug output to the console:

apple.audio: Restarting the AudioGenerator
apple.audio: AudioOutput: size = 11025 f, period = 882 f
apple.audio: Written some silence: frames = 8820 , duration = 200 ms
apple.audio: Restarting the AudioGenerator
apple.audio: AudioOutput: size = 11025 f, period = 882 f
apple.audio: Written some silence: frames = 8820 , duration = 200 ms
apple.audio: Stopping with silence: frames = 5692 , duration = 129 ms
apple.audio: Stopping with silence: frames = 10093 , duration = 228 ms

Could this be disabled?