Add Processing Time Offset Parameter

prof-spock commented 4 years ago

Hello derselbst,

I have a somewhat esoteric request: how complicated is it to add an API (and possibly command line) parameter for shifting the processing time of fluidsynth? The use case is as follows:

I have some midi file M1 that is processed via fluidsynth giving wave file W1.
Then I cut that midi file at time t and process its remainder M2 via fluidsynth giving wave file W2.
Now I split W1 at time t giving W3.
My hope is that (apart from some transients) W2 and W3 are identical.

But, of course, they are not. I assume that some free-running oscillators for modulators are not in sync, because they start at time 0 in W1 and W2, but at time t in W3. If there were some magical parameter offsetTime, one could process M2 with offsetTime=t and have the oscillators run for time t before processing M2.

Okay, everybody in his right mind says: why don't you just W3 for a split audio? But this does not work for me, because I use fluidsynth in two ways:

I process M1 via a command-line giving W1.
Additionally I have midi file M1 as a track T1 in a DAW with fluidsynthVST as the plugin.
On another track T2 I have wave file W1.
I try to compare output of T2 with the realtime output of T1 (by subtracting them).

This works quite well when starting the rendering at 0:00 (apart from some fixed VST processing offset that can be compensated), but fails for above reason when rendering starts somewhere in the middle. But this could work in principle, when the VST can tell the libfluidsynth library to "start" at the current DAW time. I assume that syncing oscillators to some given time might be complicated, but then the remaining processing should be straightforward. Is this reasoning plausible? Or do I miss something important?

Best regards, Prof. Spock

derselbst commented 4 years ago

Well, adding a time offset as you suggest is complicated enough (due to several LFOs as you correctly noted). But even with such an offset I'm afraid that it wouldn't fix your problem. There are a couple more time depending on parameters that influence the rendered waveform. The

state of the IIR filter,
state of the ADSR envelopes,
the pitch and volume of each voice (which we linearly interpolate between 64 frames of rendered audio), and
the delay lines in reverb and chorus

are coming to my mind ATM. So, I'm think it's practically impossible to get all of them in sync.

Instead of "cutting" M1, have you considered to simply kill all noteon events in M1 up until time t giving you silence before t and W2 (with somewhat the same oscillator states etc.) after t?

Have you considered to use cross-correlation rather than subtraction?

Have you switched off reverb and chorus?

prof-spock commented 4 years ago

Hello derselbst,

first of all thanks for your quick and very competent response!

I want to make clear that I do not strive for a complete cancellation of the signals! I just want to be able to emulate the results of an external rendering line with fluidsynth (and sox) within a DAW with appropriate plugins (one of them being fluidsynthVST). So I would be absolutely fine with a residue of say -40dB between the signals, because I do not hear differences in the signals that are below that threshold and typically are masked (by the interesting audio!).

Let me discuss a few points in your comment:

[Besides LFO phases the parameters affected by time are] • state of the IIR filter,

Granted, but I think that the differences between an IIR filter "in action" and some filter just starting should be below the mentioned threshold after some time. Of course, this will depend on the feedback coefficients.

• state of the ADSR envelopes,

You're right for an arbitrary split point, but when assuming that the cut is done where no note is active (e.g. when previewing some region within a song), only the release parts of the envelopes could spill over. Again after some setup time this should be below the threshold.

My naive assumption is that the envelopes are triggered by NOTEON and NOTEOFF events or do I miss something here?

• the pitch and volume of each voice (which we linearly interpolate between 64 frames of rendered audio), and

I do not get that assuming that no portamento or CC07 is used for some note crossing the cut line. Even then there might be small differences, but they should be below the threshold.

• the delay lines in reverb and chorus Have you switched off reverb and chorus?

The MIDI files are automatically transformed for the DAW not to use reverb at all (by setting CC91 to 0 in the beginning of the file and deleting them elsewhere). The command line fluidsynth has a -R 0 parameter. But I missed chorus so far, good advice, the same probably applies to delay.

I just was assuming that my preferred instruments from Fluid_R3 don't use that, but I have to check.

Instead of "cutting" M1, have you considered to simply kill all noteon events in M1 up until time t giving you silence before t and W2 (with somewhat the same oscillator states etc.) after t?

A good idea in principle, but it does not apply to the DAW situation, where you just start playing at t and comparing M2 rendered by fluidsynthVST/libfluidsynth to W1 cut at t.

Here is my test setup using S.C.Collins midi test file and the associated diagnostic soundfont. I have also written a small plugin compensating the amplitude and offset difference of fluidsynthVST: Amplitude and Latency Compensation

Have you considered to use cross-correlation rather than subtraction?

No, I didn't, thanks for the hint: I was looking for such a VST, but didn't know the technical term. I will be playing around with it and also try to ascertain the latency of fluidsynthVST (which seems to be 64 samples).

Okay, let me play around with your ideas; I'll keep you informed.

Best regards, Prof. Spock

mawe42 commented 4 years ago

Just a quick thought, maybe too basic, but I thought I'd write anyway: does your midi file maybe contain not just note on/off messages, but also cc commands that affect the channel setup? For example things like channel expression, pitch bend range, maybe even sysex commands to alter envelopes? In that case you would obviously get very different results if you omit those messages and start somewhere further down in the midi stream.

prof-spock commented 4 years ago

Hello mawe42,

for the diagnostic test shown above the midi file from S.C.Collins does not contain anything except note data (and initial volume and pan setup). But to be honest the soundfont is a bit pathological, because Christian used it for testing the soundfont spec conformity of several plugins. When using the above setup (with external fluidsynth and fluidsynthVST), everything cancels out fine when starting with t=0, but, of course, for arbitrary t, there is no cancellation of the tracks.

Best regards, Prof. Spock

prof-spock commented 4 years ago

Dear all,

first of all sorry for still keeping this issue alive! I concede that introducing such a time offset parameter might be hopeless in the general case, because of the free-running oscillators and possibly other circumstances. Now if we cannot handle a soundfont with a complex oscillator logic in such a way, why not use/create a soundfont with just simple sample playback and envelopes? In my opinion its playback should be completely time-independent: the sample playback and the envelope start at the time of the noteon, the release at the noteoff, so there is no surprising magic here. I tried this with a naive midi file and a stripped down piano soundfont with fluidsynth on the command line and fluidsynthVST (using libfluidsynth) in Reaper. As always the fluidsynth wave file and the live rendered audio from fluidsynthVST completely cancel out when starting at t=0 (with a processing(?) offset of -64 samples for the VSTi). Unfortunately when starting at an arbitrary position this cancellation does not work. So libfluidsynth is in a different state although it is just rendering samples plus envelopes. And to be honest I do not understand where my above reasoning fails...

Best regards, Prof. Spock

mawe42 commented 4 years ago

Can you try if the cancellation works if you use a starting point that is a multiple of 64 frames from the start of your reference?

prof-spock commented 4 years ago

Hello mawe42, great analysis! The cancellation works if the starting point is at k·64s/44100 (k∈ℕ). This seems to be the VST block size? Best regards, Prof. Spock

mawe42 commented 4 years ago

Great! I got the idea from something that @derselbst wrote in his first reply:

the pitch and volume of each voice (which we linearly interpolate between 64 frames of rendered audio)

So my guess is: without significant changes to the way we handle those aspects of the sound, you're out of luck (for arbitrary offsets).

prof-spock commented 4 years ago

Hello mawe42,

you wrote:

without significant changes to the way we handle those aspects of the sound, you're out of luck (for arbitrary offsets)

I am fine with that, you do not have to add complexity to your software for esoteric use cases.

What I can do instead in the DAW is introduce a macro to rasterize loop positions and the play cursor start onto the 64 sample raster. This is easy to do. And I doubt I will be able to hear that the chorus starts 32 samples early or late ... 😉

Best regards, Prof. Spock

prof-spock commented 4 years ago

Dear all,

I have written the macro and it works perfectly! So I just have to make sure that no free-running modulation occurs in the soundfont and I'm fine. Thanks to you both for restoring my faith in digital audio processing! 👍

I assume we can close this issue: such a parameter would be desirable from a scientific standpoint, but an okay workaround exists for the issue...

Best regards, Prof. Spock

derselbst commented 4 years ago

I have written the macro and it works perfectly!

Glad to hear that!

And to my surprise, it seems that your "😉" was the expression of an emotion... fascinating.

FluidSynth / fluidsynth

Add Processing Time Offset Parameter #618