Degraded audio fidelity during pitch bends in default XSynth

BlackMIDIDevs / xsynth

The fastest Black MIDI synthesizer, playing 8,000 voices or more in realtime. Uses aggressive SIMD and multithreading, and supports a subset of the sfz and sf2 formats.

GNU Lesser General Public License v3.0

18 stars 6 forks source link

Degraded audio fidelity during pitch bends in default XSynth #77

Open Basiliotornado opened 1 month ago

Basiliotornado commented 1 month ago

https://github.com/user-attachments/assets/73932d9f-dc1c-417a-a6c2-2dcf67028f0e

None of the synth settings seem to help or change anything The audio was recorded on the linux binary in releases, but it still does this with a freshly compiled one

arduano commented 1 month ago

I'm not too sure what you mean, maybe @MyBlackMIDIScore understands this better?

Also, just to make sure, does this happen on all soundfonts you tested?

MyBlackMIDIScore commented 1 month ago

@Basiliotornado Hi, thanks for reporting this. Would you be able to provide the MIDI and soundfont used?

Basiliotornado commented 1 month ago

https://drive.google.com/file/d/1VjYpc6K0s2epxz9wqRsaQXCFU3QPi3aN/view?usp=sharing

I suspect it's something to do with the pitch shifter, based on which notes create the distortion, and it doesn't sound like it's happening with Amethyst Imperial Grand, which seems to have one sample for each key.

I should also mention that the soundfont is native sf2, I converted it to sfz with polyphone.

arduano commented 1 month ago

Ah yep, depending on the scale of this, that's a known issue. XSynth doesn't do any interpolation when sampling, just nearest neighbor, because of the absolutely massive performance impact we've obeserved that interpolation causes

Instead, it expands the samples when loading the soundfont per key. So each key gets a sample, which is scaled to 4x the output sample rate.

The only thing that breaks here is pitch bends, which would emit a very faint hiss due to nearest neighbor sampling. I'm not sure if we can have a proper fix for this, but I'll wait for @MyBlackMIDIScore's input

MyBlackMIDIScore commented 1 month ago

I suspect it's something to do with the pitch shifter

Yeah, I just did some testing and it turns out this is what is causing those artifacts. By default XSynth does nearest neighbor interpolation, which is fast but lowers the quality of the audio in cases like this. It does provide an option for linear interpolation though which fixes this issue, but it can be slower in some cases hence why it is not enabled in Wasabi.

Although in the future I do plan to add more settings for XSynth to Wasabi, one of which is the interpolation algorithm. Right now the only way to use linear interpolation in Wasabi is to add this line:

interpolator: xsynth_core::soundfont::Interpolator::Linear,

in src/audio_playback/xsynth.rs after line 105 and build the program yourself. If you have any more questions, let me know.

arduano commented 1 month ago

jinx

arduano commented 1 month ago

I wouldn't necessarily close this issue as it's a real issue with xsynth, I'll rename the title though

And although we have Interpolator::Linear, I don't know how well it's actually hooked up to the voice picker and whether linear is even enough in this context for perfect sound quality, or if we need even smarter interpolation for people who want quality over performance

MyBlackMIDIScore commented 1 month ago

In this case (at least in the tests I did) linear interpolation fixes the issue. A "fix" for this would be to use linear by default but this will have quite a performance hit. So I guess what needs to be worked on is the speed of the linear interpolator?

As far as having more interpolation algorithms, this could help in some niche cases, but I think a priority should be to figure out the performance hit of linear int. first

Btw I'll move this issue to XSynth because it's not related to Wasabi

Basiliotornado commented 1 month ago

Added the linear interpolation, sounds a ton better!

Though comparing performance on versions of septette for the dead princess, the lerp version starts stuttering on the end of 63 million, while no interpolation holds strong through the entirety of 92 million (I drop below 60fps first). So I can see why this is the default. For Wasabi though, I feel it should be an option somewhere in synth settings. I can go make a feature request there.