Unwanted timing quantization when using large buffers

FluidSynth / fluidsynth

Software synthesizer based on the SoundFont 2 specifications

https://www.fluidsynth.org

GNU Lesser General Public License v2.1

1.89k stars 259 forks source link

Unwanted timing quantization when using large buffers #1120

Closed pedrolcl closed 2 years ago

pedrolcl commented 2 years ago

FluidSynth version

2.2.x, probably older versions too.

Describe the bug

When using large audio buffers, the start time of many notes is quantized. This affects severely to the musical rhythm, and it can be perceived as degraded quality. This has been observed with several audio drivers, but PulseAudio is specially impacted because it often requires larger buffers.

Expected behavior

When using large buffers big latency is expected, but quantization should not happen.

Steps to reproduce

Start fluidsynth on a Linux terminal, using large buffers (>4K):

fluidsynth -p fluid -m alsa_seq -a pulseaudio -z 4800 ~/.local/share/soundfonts/GeneralUser.sf2

From another terminal instance, use aplaymidi to play MIDI files via the ALSA Sequencer:

aplaymidi -p fluid schubert_avemaria.mid TwistAndShout.mid

Additional context

This is somewhat related to the ticket #1 but instead of duration, the starting time is impacted when using large audio buffers.

Sometimes large buffers are wanted in despite of the latency, probably because latency doesn't matter on every situation. Or perhaps can't be avoided, like when using PulseAudio in many scenarios. And in this case, the PulseAudio driver may be easily worked around to avoid this issue. See: pedrolcl/fluidsynth@adffd1c

The number 128 is arbitrary. The minimum buffer for the synthesizer is 64 frames, and I am not advocating for using it as a fixed constant.

The PulseAudio driver is a particular case where the problem can be easily solved because PulseAudio supports it out of the box, but other drivers may be harder to fix (and I've not tried any other).

Here are two MIDI files where the rhythm is negatively impacted by this issue:

TwistAndShout.mid from this page. https://github.com/pedrolcl/dmidiplayer/raw/master/examples/schubert_avemaria.mid

derselbst commented 2 years ago

Pls set -o audio.pulseaudio.adjust-latency=0 and try again. If that doesn't fixes it, pls try a different audio driver.

pedrolcl commented 2 years ago

Pls set -o audio.pulseaudio.adjust-latency=0 and try again. If that doesn't fixes it, pls try a different audio driver.

Perhaps I've not been clear on this point: this has nothing to do with latency, and all drivers are affected when using large audio buffers.

derselbst commented 2 years ago

Ok, after giving it a listen I hear what you mean by "quantized".

The problem is that most (all?) audio drivers request as many samples from the synth as the provided period size. If there are no rvoice events queued, the synth will render as many samples as requested. If an event arrives shortly afterwards, it will sound untimed, because a lot audio has already been rendered and the event needs to wait until the next rendering call to take effect.

One could instruct all audio drivers to render only fluid_synth_get_internal_bufsize() samples each call. But this would increase CPU usage significantly and would defeat the benefit of having a bigger period-size. Because if you render every 64 samples individually, you could also send them to the audio driver right away. I.e. having a larger period size would be pointless in this case, wouldn't it?

So, I understand that the current behavior is undesirable from a user perspective. I would vote for documenting this behavior and let the user decide about how often fluidsynth can re-evaluate it's internal event processing.

pedrolcl commented 2 years ago

Ok, after giving it a listen I hear what you mean by "quantized".

Quantization is the MIDI terminology for the process of moving MIDI notes that are recorded out of time to a rhythmically correct position. I'm using it to mean exactly the opposite, because the musician may play strictly on time, but the synthesizer moves some notes to wrong positions in the time line, without any musical justification. This is a chronic illness on Fluidsynth, that once was called the "drunken drummer" syndrome. Do you prefer that expression?

The problem is that most (all?) audio drivers request as many samples from the synth as the provided period size.

There are some drivers that must do that, because it is mandated by the backend. Other drivers are simply blindly mimicking the model, without trying to leverage other available options. The latter is the case of our PulseAudio driver.

if you render every 64 samples individually, you could also send them to the audio driver right away. I.e. having a larger period size would be pointless in this case, wouldn't it?

Indeed, if you are trying to save battery/power/CPU cycles, you don't want to render small buffers. On the other hand, you may be forced by the audio server (PulseAudio) to accept large buffers, so high latency, but you may render your audio in small fragments, and the server accepts them. I don't see the problem with that. My problem is when the music sounds wrong.

I would vote for documenting this behavior and let the user decide about how often fluidsynth can re-evaluate it's internal event processing.

I don't understand your proposal. Do you mean another option, say "synth.rendering-size" whose value could be changed by the user, for instance to 128, and this value will be used instead of the constant value here?

https://github.com/pedrolcl/fluidsynth/blob/adffd1cf714c012688b664940032e4a6b4ff77ed/src/drivers/fluid_pulse.c#L139

And what about other drivers?

pedrolcl commented 2 years ago

The problem is that most (all?) audio drivers request as many samples from the synth as the provided period size. If there are no rvoice events queued, the synth will render as many samples as requested. If an event arrives shortly afterwards, it will sound untimed, because a lot audio has already been rendered and the event needs to wait until the next rendering call to take effect.

Some drivers (jack, oboe) do not take into account the "audio.periods" and "audio.period-size" settings at all, because the backend forces some buffer size, that the driver must comply.

One could instruct all audio drivers to render only fluid_synth_get_internal_bufsize() samples each call. But this would increase CPU usage significantly and would defeat the benefit of having a bigger period-size. Because if you render every 64 samples individually, you could also send them to the audio driver right away. I.e. having a larger period size would be pointless in this case, wouldn't it?

If the driver calls fluid_synth_write_float() or fluid_synth_write_s16() or similar blocking functions with large buffers, the synth actually renders fragments of 64 frames in a loop until the buffers are filled. If the driver calls the synth render functions with small buffers, it will need to issue a greater number of calls, but the CPU usage in both scenarios would be nearly the same overall.

Where is the problem then? In the mechanism for delivering the rendered buffers to the audio backends. In the PulseAudio driver it is pa_simple_write(), which is a blocking function call. PulseAudio has another asynchronous API, which may be worth to explore. Of course, calling write with small buffers will block for less time each cycle.

Another example is the ALSA audio driver. The driver uses snd_pcm_writei(), which may block. Unfortunately, the driver selects unconditionally the blocking mode. Using this driver, if the user configures a small buffer size with a large number of periods (high latency anyway) the time quantization may be avoided. This trick is not available in the PulseAudio driver, because it does not use the "audio.periods" setting at all.

derselbst commented 2 years ago

This is a chronic illness on Fluidsynth, that once was called the "drunken drummer" syndrome. Do you prefer that expression?

It's ok. I just hadn't heard this expression in the MIDI context before.

Do you mean another option, say "synth.rendering-size" whose value could be changed by the user, for instance to 128, and this value will be used instead of the constant value here?

No, I mean that people could use audio.period-size or -z to adjust that.

If the driver calls fluid_synth_write_float() or fluid_synth_write_s16() or similar blocking functions with large buffers, the synth will need to render fragments of 64 frames in a loop until the buffers are filled. If the driver calls the synth render functions with small buffers, it will need to issue a greater number of calls, but the CPU usage in both scenarios would be nearly the same overall.

We cannot change the synth to always render 64 sample chunks because that would significantly increase CPU usage. This is particularly not desired when rendering audio for the MIDI player or the sequencer.

That's why the 64 sample chunk rendering would need to be implemented on driver level, for each and every driver. But even then I'm afraid one would have some kind of drunken drummer. For instance, assuming we have the maximum latency of 8192 samples (and a single period buffer), and we fill up that buffer with individual 64 sample rendering calls before handing it over to the audio driver. Given that latency the audio driver API would make us block for about 185ms. If the rendering of those 8192 / 64 calls takes significantly less time than the latency itself, e.g. only 1ms, we would have 1ms time for any rvoice events to arrive. When handing the buffer to the driver we would block for 185ms, where a lot more events could arrive but we wouldn't have a chance reacting on them. Just at the next rendering call, we will have untimed events again.

The only fix for that I see is by having a smart wrapper around the synth's rendering functions that 1) render any arbitrary buffersize in 64 sample chunks and 2) measure the time for each rendering call and potentially sleep() when rendering was done too quickly. In this case, the drunken drummer would only appear if the OS decided to sleep() a little longer than we requested for.

But yes, technically this approach would result in pretty much the same behavior as having a small period-size and a large number of periods, as you mentioned. Therefore I'm not sure if it would be worth the implementation effort. One could rather just issue a warning if audio.period-size is big and advice to use more audio.periods instead. And audio drivers would have to honor the latter ofc.

pedrolcl commented 2 years ago

We cannot change the synth to always render 64 sample chunks because that would significantly increase CPU usage. This is particularly not desired when rendering audio for the MIDI player or the sequencer.

I think that you misunderstood me. I've corrected the wording now on my answer.

I was not proposing any change on the synth. I was talking about fluid_synth_render_blocks() that already loops to process a given number of blocks of 64 frames.

derselbst commented 2 years ago

I was not proposing any change on the synth. I was talking about fluid_synth_render_blocks() that already loops to process a given number of blocks of 64 frames.

No rendering is happening in this loop. It only determines the number of blocks that can be rendered and it breaks out early if an rvoice event arrived. That's what I meant in my previous comment; the drivers need to be changed to only render 64 samples, so that blockcount becomes 1, and then the smart rendering wrapper would need to wait to give any events from the MIDI driver the chance to arrive, before rendering another block.

I think I would simply change new_fluid_midi_driver() to issue a warning if it detects audio.period-size to cause a latency of >10ms or so. And then advice people to use more audio.periods instead.

jjceresa commented 2 years ago

I think I would simply change new_fluid_midi_driver() to issue a warning if it detects audio.period-size to cause a latency > 10ms or so. And then advice people to use more audio.periods instead.

You probably wanted to say: "new_fluid_audio_driver()" and "less" audio.periods.

Also, perhaps it worth to document that actually in all fluidsynth audio rendering API, MIDI events onset time precision is (unfortunatelly) highly dependent of how most frequently those functions API are called ?.

pedrolcl commented 2 years ago

No rendering is happening in this loop. It only determines the number of blocks that can be rendered and it breaks out early if an rvoice event arrived.

We disagree here, maybe because each of us is using different terminology (again!)? Or because we are talking about different loops.

First of all, fluid_synth_render_blocks() is one of the topmost functions that I would call "the crux of the matter" about "synthesis", which I use as an equivalent of "rendering", although I agree that rendering encompasses some more processes. I've chosen this function because it is called directly or indirectly from all the synthesis public functions: fluid_synth_write_s16(), fluid_synth_write_float(), fluid_synth_process() and even the currently deprecated fluid_synth_nwrite_float(). On the other hand, inside fluid_synth_render_blocks(), the next rendering function which is called a few lines later after your break sentence is fluid_rvoice_mixer_render() which in turn calls fluid_render_loop_multithread() or fluid_render_loop_singlethread() which are the loops that I was talking about. All of them having a blockcount parameter.

Second, the early loop that you highlighted is hardly relevant in this scenario (rendering MIDI events in real time, received from a MIDI driver). I've checked that the break sentence is rarely triggered, so the blockcount parameter keeps almost always the original value of (period_size + FLUID_BUFSIZE - 1) / FLUID_BUFSIZE.

pedrolcl commented 2 years ago

I think I would simply change new_fluid_midi_driver() to issue a warning if it detects audio.period-size to cause a latency of >10ms or so. And then advice people to use more audio.periods instead.

You probably mean new_fluid_audio_driver() as JJ said.

latency > 10 ms is a totally arbitrary value. Some research have found that latency below 50 ms. may be tolerable.

I agree that a revision of the audio drivers is in order. They should include both options (buffer size and number of buffers). I also agree that the buffer size should be as as low as possible, when the number of buffers can be raised. The overall latency would be calculated with the product of both numbers, but this is not what matters here, only the contribution of the period size is of importance for the timing artifacts (timing quantization) that we are talking about here.

The following table outlines the options available on each driver to configure the buffer size and the number of buffers.

-c, --audio-bufcount=[count]: Number of audio buffers (audio.periods setting)
-z, --audio-bufsize=[size]: Size of each audio buffer (audio.period-size setting)

Driver	audio.periods	audio.period-size
ALSA	Yes	Yes
aufile	No	Yes
Coreaudio	Yes	Yes
dart	Yes	Yes
dsound	Yes	Yes
jack	No	No
oboe	No	No
opensles	No	Yes
oss	Yes	Yes
pipewire	No	Yes
portaudio	No	Yes
pulse	No	Yes
sdl2	No	Yes
sndmgr	Yes	Yes
wasapi	Yes	in exclusive mode
waveout	Yes	Yes

Some drivers (Jack and Oboe) don't use these options at all. The Jack server has a parameter to define the period size, and this value is used directly by the Fluidsynth Jack driver. When this value is big enough, the same time artifacts can be observed. The relevance of this one is low because the Jack users are typically obsessed with low latency anyway.

pedrolcl commented 2 years ago

You probably wanted to say: "new_fluid_audio_driver()"

Yes, this is about the audio drivers, not about the midi drivers.

and "less" audio.periods.

No. This problem is not about low latency. Sometimes high latency is needed, but without generating other problems like the quantization time of this report. The ALSA driver shows this problem when audio.periods is low but audio.period-size is high. A workaround is to raise (more) audio.periods and lower (less) audio.period-size, so the overall latency would be the same, but the synthesis would happen with more frequent cycles of shorter duration. This may not be possible with other audio drivers.

derselbst commented 2 years ago

You probably mean new_fluid_audio_driver() as JJ said.

No, I mean new_fluid_midi_driver(), because the problem we are talking about here is only relevant if people are using MIDI drivers.

and "less" audio.periods.

No. This problem is not about low latency. Sometimes high latency is needed, but without generating other problems like the quantization time of this report. The ALSA driver shows this problem when audio.periods is low but audio.period-size is high. A workaround is to raise (more) audio.periods and lower (less) audio.period-size, so the overall latency would be the same, but the synthesis would happen with more frequent cycles of shorter duration. This may not be possible with other audio drivers.

Excactly. And I don't mind if this isn't possible with all driver, as you probably wouldn't use e.g. jack if you require a high latency.

latency > 10 ms is a totally arbitrary value. Some research have found that latency below 50 ms. may be tolerable.

The default audio.period-size on windows is 512. Given the default srate of 44100 Hz this yields 11.6 latency. A warning should not be issued in case of default values. If you say 50ms is still fine, I'm ok with that.

First of all, fluid_synth_render_blocks() is one of the topmost functions that I would call "the crux of the matter"

I am not talking about fluid_rvoice_mixer_render(). As you correctly said, this is where the rendering or synthesis is happing.

Second, the early loop that you highlighted is hardly relevant in this scenario

This loop is very relevant. Usually, the events are queued somewhere in the MIDI player or sequencer, and are being dispatched by fluid_sample_timer_process() call: The loop runs, dispatches all events, the events trigger fluid_synth_noteon() and friends, they enqueue rvoice events and fluid_rvoice_eventhandler_dispatch_count() is becoming true and the loop is breaked.

When using the MIDI driver however, we don't know when any events will arrive, we don't know when any call to fluid_synth_*() will be made and when rvoice events will be enqueued. The loop will just run through very quickly, assuming that there are no events for the next 4800 samples and telling the rvoice mixer to render that many blocks. The break in this loop is rarely triggered because of this fact that events haven't yet arrived...

so the blockcount parameter keeps almost always the original value

...and that's the problem.

jjceresa commented 2 years ago

No, I mean new_fluid_midi_driver(), because the problem we are talking about here is only relevant if people are using MIDI drivers.

The problem we are talking about here (MIDI event onset time inaccuracy) is also relevant if any application is calling directly the synth MIDI API (i.e fluid_synth_noteon(), fluid_synth_noteoff(),..) without relying on any MIDI driver. Is not it ?.

pedrolcl commented 2 years ago

You probably mean new_fluid_audio_driver() as JJ said.

No, I mean new_fluid_midi_driver(), because the problem we are talking about here is only relevant if people are using MIDI drivers.

The steps to reproduce this bug are using the MIDI drivers because it is one easy use case where the problem manifests itself, but there is another affected scenario: applications that use the shared library, without the MIDI drivers, only the public MIDI synth functions:

https://github.com/FluidSynth/fluidsynth/blob/b274aaefba5b7fc06ed8dfb51c9efb67707f3fbf/include/fluidsynth/synth.h#L70-L104

My applications that use FluidSynth this way: VMPK, dmidiplayer, and all others based on Drumstick::RT.

This loop is very relevant. Usually, the events are queued somewhere in the MIDI player or sequencer, and are being dispatched by fluid_sample_timer_process() call: The loop runs, dispatches all events, the events trigger fluid_synth_noteon() and friends, they enqueue rvoice events and fluid_rvoice_eventhandler_dispatch_count() is becoming true and the loop is breaked.

When using the MIDI driver however, we don't know when any events will arrive, we don't know when any call to fluid_synth_*() will be made and when rvoice events will be enqueued. The loop will just run through very quickly, assuming that there are no events for the next 4800 samples and telling the rvoice mixer to render that many blocks. The break` in this loop is rarely triggered because of this fact that events haven't yet arrived...

so the blockcount parameter keeps almost always the original value

...and that's the problem.

And maybe that is also a clue for a possible general solution, I hope.

pedrolcl commented 2 years ago

No, I mean new_fluid_midi_driver(), because the problem we are talking about here is only relevant if people are using MIDI drivers.

The problem we are talking about here (MIDI event onset time inaccuracy) is also relevant if any application is calling directly the synth MIDI API (i.e fluid_synth_noteon(), fluid_synth_noteoff(),..) without relying on any MIDI driver. Is not it ?.

Yes.

derselbst commented 2 years ago

And maybe that is also a clue for a possible general solution, I hope.

If there are no events to come for the next 4800 samples, why should rendering should be made in small 64 sample chunks? Doing so would be a pessimization for people who use the fast file renderer or the MIDI sequencer.

I would simply address this issue by updating documentation here and there.

Admittedly, it seemed unlikely to me that people are using the synth API directly and yet still using audio drivers. I still think it's better to only issue a warning in new_fluid_midi_driver() to avoid annoying people who use the commandline without a midi driver. We could also put the warning into new_fluid_audio_driver(), I don't mind too much.

Besides that, I would update the docs of audio.period-size and add a hint to the synth's rendering functions.

derselbst commented 2 years ago

PR is ready.

derselbst commented 2 years ago

Marking as won't fix and closing. This behavior exists since version 1.1.3. I see no practical solution for it. The behavior is now documented. A workaround is available, which I consider to be simple and reasonable for an end user.