FluidSynth / fluidsynth

Software synthesizer based on the SoundFont 2 specifications
https://www.fluidsynth.org
GNU Lesser General Public License v2.1
1.76k stars 246 forks source link

Imprecise Quantization for Audio File Rendering #1134

Closed prof-spock closed 1 year ago

prof-spock commented 1 year ago

Dear all,

I am sorry: it's me again. You might remember that I am hunting for the bit-exact reproduction of Fluidsynth in an audio plugin within a digital audio workstation (see also issues #618 and #1074).

This is fine in principle: I have been able to implement a simple JUCE wrapper around the fluidsynth library with primitive settings input (as a multi-line text field) and its renderings in the DAW are close to the external renderings by Fluidsynth.

But there seems to be a catch: in my opinion the rendering logic in Fluidsynth for audio files is incorrect.

In Fluidsynth the output to the audio file is rasterized by audio.period-size which must be at least 64 samples (according to documentation and fluid_adriver.c) and hence MIDI events are shifted onto that raster.

A much better logic is to either allow a minimum period size of one sample or to dynamically adapt the buffer sizes when doing a fast rendering to file.

My algorithm in the JUCE wrapper uses the following logic:

    sampleRate = <<target audio file sample rate, e.g. 44100Hz>>
    bpmRate = 120;
    channelCount = 2;
    <<read MIDI file header into "midiTicksPerQuarterNote">>
    midiTimeToSecondsFactor = bpmRate / (60.0 * midiTicksPerQuarterNote);
    <<collect events in MIDI file into a map "midiEntryList" from
      MIDI time to a list of synchronous MIDI events>>

    previousTimeInSamples = 0;
    sampleBuffer = new SampleBuffer(channelCount, 0);

    FOR EACH entry IN midiEntryList DO
        synchronousEventList = entry.eventList;
        eventTimeInSeconds = entry.midiEventTime * midiTimeToSecondsFactor;
        eventTimeInSamples = round(eventTimeInSeconds * sampleRate);
        offsetTimeInSamples = eventTimeInSamples - previousTimeInSamples;

        // fill local buffer with samples for offset time and append to
        // sample buffer
        thisBuffer = new SampleBuffer(channelCount, offsetTimeInSamples);
        fluid_synth_process(synthesizer, thisBuffer, offsetTimeInSamples);
        sampleBuffer += thisBuffer;
        thisBuffer.destroy();

        FOR EACH event IN synchronousEventList DO
            <<handle event in "synthesizer";
              if tempo meta event, update "midiTimeToSecondsFactor">>
        END;

        previousTimeInSamples = eventTimeInSamples;
    END;

If you were really paranoid, you could do some sort of "oversampling" to handle event occurrences starting or ending at positions not coinciding with the sample raster, but this is completely over the top in my opinion. But at least a rasterization on sample divisions should be done by fluidsynth when doing a rendering to file.

I also reimplemented the above algorithm in a Python command line wrapper for libfluidsynth and for well-behaved soundfonts (e.g. without LFOs) a spectral analysis shows that this produces output identical within a -200dB margin with that of my Fluidsynth plugin in a DAW (using the same algorithm). A test of that is shown in the attached screenshot image.

fluidsynth_plugin_test

Emulating that kind of event quantization on a period raster in the DAW plugin instead would be very tedious, because you have to guess the external audio.period-size and also juggle around samples when DAW playback is not started at such a period raster position.

So my bug report/feature request is: either allow an audio.period-size of 1 for file rendering (which might impede performance significantly, because the synthesizer always renders one sample) or change the algorithm to using dynamic period lengths (as shown above) to minimize the number of fluid_synth_process calls.

Is this unreasonable or tedious on your side?

Best regards, Prof. Spock

derselbst commented 1 year ago

So my bug report/feature request is: either allow an audio.period-size of 1 for file rendering (which might impede performance significantly, because the synthesizer always renders one sample) or change the algorithm to using dynamic period lengths (as shown above) to minimize the number of fluid_synth_process calls.

Fluidsynth is a realtime synth. In a realtime scenario, you don't know when events will arrive. Therefore, we cannot use "dynamic period lengths". Esp. because no audio driver out there supports something like this.

A period size of 1 would impede performance. Pls. don't get me wrong, but I still do consider your use-case to be a corner case. And I don't think it's worth to impose such a significant change to everyone. Fluidsynth is opensource, so I encourage you to pls. change FLUID_BUFSIZE to 1, recompile it and see what happens (I'm not even sure, if it's supported by the sample interpolations).

the rendering logic in Fluidsynth for audio files is incorrect.

There is no such thing as "rendering logic for audio files". Fluidsynth simply uses the one and only real-time rendering logic and then writes the generated samples into a file.

Anyway, it is not clear to me how you're doing it. Are you using that JUCE wrapper you've provided, or are you using fluidsynth's file driver? And if you are using that JUCE wrapper, why are you worried about audio.period-size? This setting is used by the audio drivers. But your wrapper is calling fluid_synth_process() directly, so there is no audio driver involved. Yet you're right, that 64 samples (=FLUID_BUFSIZE) is the smallest unit for which fluidsynth can make state changes. So if you're experiencing poor quantization, I would expect that offsetTimeInSamples is usually < 64 samples. This results in an unsafety of 1.4 ms, which is inaudible (even for Vulcans), but admittedly not suited for a "bit-exact" reproduction.

Note that your wrapper is basically just a reinvention of fluidsynth's sequencer.

Also, pls. be advised to read the warning here in case you haven't.

prof-spock commented 1 year ago

Hello derselbst,

thanks for your quick response!

Pls. don't get me wrong, but I still do consider your use-case to be a corner case.

You are absolutely correct, this is even understated: my use-case is quite academic.

But when trying to mimic fluidsynth's behaviour in a DAW exactly, one has to make sure that there is no artificial raster in the file rendering of the command-line version, because this would be hard to emulate in a DAW.

Fluidsynth is a realtime synth. In a realtime scenario, you don't know when events will arrive. Therefore, we cannot use "dynamic period lengths". Esp. because no audio driver out there supports something like this.

Granted, I have no idea whether people mainly use fluidsynth as a real-time synth or as an offline MIDI-file-to-audio-file-converter. But in case they do the latter, then real-time constraints no longer apply and one can work with those "dynamic period lengths".

My naive idea was that the fluid_filerenderer does exactly that, but it obviously does not.

Anyway, it is not clear to me how you're doing it. Are you using that JUCE wrapper you've provided, or are you using fluidsynth's file driver?

I am doing neither. The python command-line program was just an emulation to ensure that the plugin implementation (with JUCE and the logic described above) has no apparent logical flaws. I did this because there was a measurable difference to the genuine command-line fluidsynth and I already had that when testing the FluidsynthVST plugin from another author some time ago.

What I then did is reimplement a simple MIDI file reader followed by a scheduler calling fluid_synth_process (as described in the code snippet above) in a python program. So this program is just a poor man's fluidsynth reimplementation just for converting MIDI to WAV.

And if you are using that JUCE wrapper, why are you worried about audio.period-size? This setting is used by the audio drivers.

Absolutely. This setting only applies to the genuine command-line fluidsynth. But when I want to use that original command-line program, I am affected by that, because it distorts the MIDI event positions somewhat.

Yet you're right, that 64 samples (=FLUID_BUFSIZE) is the smallest unit for which fluidsynth can make state changes. So if you're experiencing poor quantization, I would expect that offsetTimeInSamples is usually < 64 samples. This results in an unsafety of 1.4 ms, which is inaudible (even for Vulcans), but admittedly not suited for a "bit-exact" reproduction.

Your analysis is absolutely correct in principle, but the offsetTimeInSamples is just the space in sample units between two adjacent and non-synchronous MIDI events, so there do not have to be such small time differences at all.

But in contrast in the standard fluidsynth implementation event times are always quantized to multiples of audio.period-size. So the quantization error is - as correctly estimated by you - quite small, but significant when being picky and using tools like e.g. a spectrum analyzer for comparing different rendering scenarios (like the DAW output vs. the command-line output).

Okay, we agree that setting the period-size to one is not advisable for performance reasons. But is it possible to change the internal logic of the file renderer to the proposed dynamic buffer size? This would at least ensure that MIDI events are positioned on the correct sample position.

If that is too complicated and not worthwhile to implement, I am fine with my poor man's file renderer using the fluid_synth_process routine. But I could play around with the fluidsynth source and see whether those changes can be kept local.

Note that your wrapper is basically just a reinvention of fluidsynth's sequencer.

Definitely; I would have preferred to use that sequencer instead of reimplementing it, but I need the precision.

Also, pls. be advised to read the warning here in case you haven't.

I have read it, but as far as I can tell it does not apply to my case. The processing in the poor man's fluidsynth (and my Fluidsynth DAW plugin) is strictly sequential.

Best regards and thanks especially for your patience! Prof. Spock

prof-spock commented 1 year ago

Hello derselbst,

because a change in the standard fluidsynth implementation is quite tedious, I decided to implement my own libfluidsynth command-line client with the above logic. The python implementation works, but is too slow, so I am porting it to C++. I'll keep you posted, if you are interested.

Best regards, Prof. Spock

derselbst commented 1 year ago

But is it possible to change the internal logic of the file renderer to the proposed dynamic buffer size?

To some extent this is already done: Even when you request 8192 samples for one rendering call, if an event occurs within the next 64 samples, fluidsynth will only render those 64 samples and then processing other events. But this only works when events are enqueued from within synth context, i.e. fluidsynth's sequencer, MIDI player, or when you render manually and enqueue events from the same thread as where the rendering happens. And also, this "dynamic" is limited to a multiple of 64 samples. Supporting an arbitrary value would be too much effort.

I'll keep you posted, if you are interested.

Sure, I would be interested if you can make it work.