cardonabits / haxo-rs

Software for the haxophone
MIT License
41 stars 10 forks source link

Stereo summing woes #29

Open Ekolide opened 5 months ago

Ekolide commented 5 months ago

As I understand it, the Haxophone has a mono output through a MAX98357A I2S audio amplifier. From it's specifications, I understand that it offers different modes for different outputs. From page 12 on the adafruit-max98357-i2s-class-d-mono-amp.pdf (available online), it specifies four modes of operation. Off (less than 0.16V). Stereo average (between 0.16V and 0.77V). Right (between 0.77V and 1.4V). Left (higher than 1.4V).

I've looked into this because I first assumed the Haxophone used a stereo output. When I created a Soundfont that didn't get me expected results, I wanted to specify what was going on.

Attached is stereo.sf2, it contains a number of programs (original work) meant to debug and specify this issue.

1. MONO - bass. Presenting the first sound, with a bass sort of character.
2. MONO - sci-fi.  Presenting the second sound, a sort of sci-fi-esque sound effect.
3. MONO - bass + effect. Both sounds simultaneously.
4. LEFT - bass. Bass sound only in Left channel.
5. RIGHT - sci-fi. Sci-fi sound effect only in Right channel.
6. LEFT - bass. RIGHT - sci-fi. Both sounds simultaneously, one in each channel.
7. LEFT - sci-fi. RIGHT - bass. Both sounds simultaneously, one in each channel (flipped).
8. A plucked melodic sound, to indicate the end of the Soundfont.
9. As per #26, program 9 will play the associated program on the default Soundfont, Celesta.

Using this SoundFont on my computer, everything works as expected. Using this Soundfont on my rPi Zero on the provided image, there are a few things that do not work as expected.

  1. Program 3 does not play both sounds simultaneously, on my device seems to randomly pick which sound to play when I "scroll" past it using the instrument change interface on the Haxophone.
  2. Program 5 is silent.
  3. Programs 6, and 7 are mostly silent. They exhibit a similar behavior as program 3; occasionally a voice will play, but I have to "scroll" past it many, many times. The sound that plays is the one corresponding to the Left channel, the bass sound on program 6, and the sci-fi sound on program 7.

From these findings I'm suspecting that the Left channel is the only one active. There also seems be another underlying issue, FluidSynth not properly playing back programs that contain multiple instruments. I do not know much about the Soundfont spec and I have not tested this, but could it be a polyphony issue? It might require one voice per sample to play multiple samples simultaneously. Raising polyphony to 2 could be a further experiment.

If Left is the only active channel, then it's not all bad. Maybe we could disable processing the right channel in it's entirety to save CPU cycles? Potentially gaining headroom for decreasing latency or increasing processing power in other areas. If using only Left is intentional design, it should be documented, and the Right output of FluidSynth could be routed so that it doesn't unexpectedly go quiet. In my opinion though, the optimal case is Stereo average output from the audio amplifier.

Let me know if there are any questions or need of further clarification.

stereo.sf2.zip

jcard0na commented 5 months ago

Hi @Ekolide,

Your findings are correct: the haxophone is designed to use only one channel (Left). The schematic shows that there is no resistor between SD_MODE and the GPIO that drives it.

image

Regarding your suggestion to disable right-channel processing in fluidsynth, I have no clue how to do that, and whether that would free up a lot of processing. My understanding of how I2S works (the protocol used to send audio frames from the RPi to the audio codec) is that both channels are sent regardless of whether the codec discards them or not.

And yes, this is probably yet-another-issue-that-should-be-documented :wink:

Ekolide commented 5 months ago

Hi!

Thanks for the clear answer. The suggestion to disable right channel is maybe a dream, I haven't come across anyone trying to, or even wanting to, do something like that with FluidSynth or ALSA. Even less so to increase performance.

Though, could ALSA possibly route the Right channel to Left, as a sort of summing in software? You could mute the right channel with amixer or alsamixer, but I don't know if that actually leads to any performance gains or if it's just another thing. I found this old forum post from a user with a similar problem, and they write of a solution that worked for them using the user config file for ALSA.

I hastily tried their solution using a .asoundrc file in the $HOME directory. I did not reach anything successful, but I also didn't look into the actual ALSA soundcard names and such more than a few blind attempts at what's defined in alsa.rs.

More documentation on the asoundrc file can be found here. If this is successful, it would make sense to be placed as a system-wide configuration.

Ekolide commented 5 months ago

Small update, but no solution.

I played around with the solution presented on that forum thread. Today I tried using the configuration presented in this alsa-lib documentation. No success. It might just be a need to set the device name correctly, I just tried some generic ALSA hw identifiers.

jcard0na commented 5 months ago

Hi @Ekolide,

I am sorry I cannot be of much help regarding alsa. Everything I've achieved with it was a result of a lot of time and frustration. In particular, it took a lot of experimentation to get to the very low latency we enjoy today. Pretty much any tweak I've tried will break that. For instance, you can see my notes here regarding real-time mixing: https://github.com/cardonabits/haxo-rs/issues/24 I'm completely stuck on that front. I think what you are trying to do with stereo fonts should be fixed in a future hardware revision. My intuition tells me that any fix in software will be kludgy and hard to maintain... but I've been wrong before!