Option to normalize volume in command line with `-F` option

spessasus commented 1 month ago

Is your feature request related to a problem?

Currently, the only way to prevent clipping during fast rendering (via -F) is to reduce the gain value until it stops clipping, which is not ideal. It would be nice to have fluidsynth do that for you.

Describe the solution you'd like

Here's my idea:

Since fluidsynth renders in float internally, even if the gain is above 1, it's still preserved in the memory. Adding an option for normalizing volume ( could be -N for example) would:

Render the file into memory (two float arrays)
Go through them and find the max and min value
Abs of whichever is greater would be the divider
Divide all samples by that divider and write out

That way, for example on a MIDI that's too loud:

max sample is 2 (or -2, whichever is further from 0)
the effective gain is = 1 / 2 = 0.5
therefore the maximum value is 2 / 2 = 1 which is within the dynamic range

Another example, on a very quiet MIDI. For example exporting any midi fron onlinesequencer.net will be just that because it forces all notes to have a velocity of 64:

max sample is 0.4
the effective gain is 1 / 0.4 = 2.5
therefore the maximum value is 0.4 / 0.4 = 1 which is within the dynamic range

Advantages

Every MIDI uses full dynamic range of the output format
Loud MIDIs (like black MIDIs) do not clip
Quiet MIDIs have roughly the same volume as loud ones
Quiet MIDIs preserve more info when converting from double to short (instead of range spanning -13100 to around 13100, it spans the full advantage of short's range: -32768 to 32767)
I don't have to play around with -o synth.gain anymore or run the output through Audacity

Describe alternatives you've considered

This could also be a function within the libfluidsynth, but not sure about that one.

Additional context

My implementation

What do you think?

derselbst commented 1 month ago

-F relies on the file driver which is a realtime driver. Your solution wouldn't work in a realtime scenario. #1302 would be more solid solution.

spessasus commented 1 month ago

Well, can't it render to memory, normalize the audio and only then fire up the file driver?

And also, the suggestion you've referenced is dynamic compression. I don't want that, I just want automatic gain adjustment for the entire song.

derselbst commented 1 month ago

Well, can't it render to memory, normalize the audio and only then fire up the file driver?

How? Chunks are rendered with 64 samples. Amplifying every 64 samples surely wouldn't sound right.

I don't want that, I just want automatic gain adjustment for the entire song.

If that's what you want, it's a simple postprocessing amplification step, which I don't really see in scope of fluidsynth.

spessasus commented 1 month ago

How? Chunks are rendered with 64 samples. Amplifying every 64 samples surely wouldn't sound right.

Wouldn't that just be this?

load the MIDI
initialize outLeft and outRight as float[]. Size is midi.duration * synth.sampleRate * sizeof(float)
fluid_synth_write_float() the entire file into them
Normalize and write out to a file

That's it, isn't it? Or am I missing something?

spessasus commented 1 month ago

If that's what you want, it's a simple postprocessing amplification step, which I don't really see in scope of fluidsynth.

I don't think you understand why this should be in fluidsynth. The processing needs to be done before writing out a wave file. Since fluid renders using float, we lose information when converting to short. This is especially the case with low gain levels: the dynamic range may span as little as 1/5th of the full -1 to 1 range. This results in wav's dynamic range being around -6k to 6k which is way less that the stock 32k range. Amplifying the output before writing a file will use the full dynamic range. Applying the postprocessing after the render will stretch out the limited 6k dynamic range.

I hope this wall of text made at least some sense of my idea to you...

ghost commented 1 month ago

This issue and the solution are not about adjusting the sound in real time. Fluidsynth currently supports outputting sounds as float if libsndfile is enabled.

for example:

fluidsynth -O float -F a.wav a.sf2 a.mid

Sounds output this way do not lose relative dynamic range and do not seem to clip even above 0dbFS. I always output the sound rendered by Fluidsynth as float to a pipe or WAV file, then adjust the sound with other software.

Thanks.

derselbst commented 3 weeks ago

initialize outLeft and outRight as float[]. Size is midi.duration synth.sampleRate sizeof(float) fluid_synth_write_float() the entire file into them

You're making two assumptions here:

This will only work when the playback duration is known, i.e. only for MIDI files, and not for real-time performance (which the file driver is capable of).
The user has enough memory to keep the entire rendered audio in memory - this is not how fluidsynth works. It has a low memory footprint, it doesn't do that.

Because of this, I consider your request to be not feasible and out of scope for fluidsynth.

As stardusteyes said, you're free to produce a wav or raw file containing floats (or doubles) and adjust the amplitude afterward without worrying to lose dynamic range.

FluidSynth / fluidsynth