Audio Frequency Proposal

marler8997 commented 2 years ago

Currently the WASM audio API controls the frequency of its channels using a 16-bit integer. However, the affect audio frequencies have on the human ear is on a "logarithmic" frequency scale rather than a linear one. By distributing the frequency values linearly, the lower frequencies will have exponentially less granularity than the higher ones. For example, frequencies 1 and 2 are single "octave" apart, but so are 32,000 and 64,000 which would have 32,000 more pitches available in between them when using whole-number frequencies.

Some Context about Audio frequencies

The human ear associates a doubling of the frequency with the sound of an "octave". To humans, two frequencies an "octave" apart sound like the "same note" harmonically but at lower/higher registers. The octave is a special interval because you can think of it like a loop. As you increase the frequency, once you've hit an octave the same notes repeat themselves in a higher register.

This is where the logarithmic scale comes from. You have to double the initial frequency every time to get to the next octave. So if you start at frequency 100, the next octave would be 200, then the next would be 400 and so on. Another way to look at it, adding 50 Hz to the frequency 100 has a greater affect than adding it to 200, you would have to add 100 to get the same affect at that frequency.

For example, the note "c0" in an "equal temperament" A440 tuning is defined as the frequency 16.35. The next octave above that, "c1" is at 32.70 (note that it's not exactly double because of "equal temperament", another topic I won't go into here). And the next octave "c2" is 65.41. The interval between c0 and c1 sounds the same as the interval between c1 and c2, even though the frequency difference between each one differs by a factor of 2. Adding 16 Hz to c0 brings us to the next octave c1, but adding 16 Hz to c1 only gets us halfway there bring us to g1 (an interval called a "fifth").

The proposal

I propose that to set the frequency of each channel, we leverage the standard established by the MIDI protocol.

https://www.cs.cmu.edu/~music/cmsip/readings/MIDI%20tutorial%20for%20programmers.html

Assuming a standard A440 equal-temperament tuning, the frequency of any midi note can be calculated with:

const twelvth_root_of_2 = Math.pow(2, 1/12);
function midiNoteToFrequency(pitch) {
    return 440 * Math.pow(twelvth_root_of_2, pitch - 69);
}

The MIDI protocol is able to represent notes from 8 Hz to 12,000 Hz using only 7 bits. A 7-bit value is referred to as the "pitch" of a note. The MIDI protocol distributes these 128 possible 7-bit pitches into frequencies that sound evenly distributed to the human ear and at intervals used by the standard 12-tone musical scale (see table below).

We can take this standard and make a small modification for wasm4. Wasm4 already uses 16-bits for the frequency, we can use the most significant byte for the MIDI "pitch" and the least significant byte for the frequencies in between "pitches". This can be done with a slight modification to our MIDI-based function by diving the tone argument by 256:

const twelvth_root_of_2 = Math.pow(2, 1/12);
function wasmArgToFrequency(tone_arg) {
    return 440 * Math.pow(twelvth_root_of_2, (tone_arg / 256) - 69);
}

Here's a table of some of the values:

Tone Arg	Frequency	Music Note Name
0	8.17579 Hz	C (-1)
1	8.17764 Hz	Slightly Sharp C (-1)
...	...	...
1 << 8	8.66195 Hz	C# (-1)
...	...	...
2 << 8	9.17702 Hz	D (-1)
...	...	...
3 << 8	9.72271 Hz	D# (-1)
...	...	...
69 << 8	440 Hz	A (4)
70 << 8	466.1637 Hz	A# (4)

So value 0 would be the lowest pitch, a C at ~ 8.18 Hz, 256 would be the next "musical half step" up, a C# at 8.66 Hz. The values in between 0 and 256 would be frequencies in between. Going up the scale by half-steps would be a matter of incrementing the frequency by 256.

Note that this mechanism works almost exactly the same as it does today, the only real difference is that the frequencies are now evenly distributed based on "harmonics" rather than arbitrary "whole number frequencies".

Concern: This will break existing WASM4 applications

It's quite easy to support both, here's the code to do so:

    tone (frequency, duration, volume, flags) {
        var freq1 = frequency & 0xffff;
        var freq2 = (frequency >> 16) & 0xffff;

        if (cart_wants_new_audio_frequency_mechanism) {
            freq1 = wasmArgToFrequency(freq1);
            freq2 = wasmArgToFrequency(freq2);
        }
        ...
    }

If we were to support both, I would expect new games to use the new mechanism exclusively and the old mechansim would only be kept around for backwards compatibility. The new mechanism should be able to support any sounds that the original mechanism with an easier interface. One idea to enable the new mechanism is to add a new bit to the system register to enable it. Another would be to add a mechanism to support version updates in the cart itself. The cart could specify which version of WASM it was coded for, and the emulator would read this version to know which features uses.

Concern: This decreases the range from 0 to 65535 to 8 and 12,000 Hz, is the frequency range large enough?

8 Hz is at the lowest range of human hearing and speakers ability to play 12,000 Hz also at the other extreme where not all speakers are even able to play and many humans can't hear (especially the older ones). And if we do want another octave (up to 24,000 Hz) we can use 8 bits instead of 7, but it's probably not necessary since most of that octave is unhearable for most people/speakers.

I've included the MIDI pitch values/frequencies for reference. Note that this assumes an A440 equal temperament, which is pretty much what everything uses:

JerwuQu commented 2 years ago

I am personally both for and against this. I saw the issue with fidelity of low notes too whilst developing my music sequencer, but was more leaning on the idea that the frequency parameter should be a float instead if an integer, which would also solve this issue without limiting audio to the 12-tone scale.

On the topic of highest frequency and what humans can hear, the note being played isn't actually a variable, the sample rate and instrument are. Even a C1 note (32.70 Hz) played as a square wave will have harmonics in the 20 KHz if the sample rate is high enough, so G9 as highest note is definitely enough.

marler8997 commented 2 years ago

which would also solve this issue without limiting audio to the 12-tone scale.

Yeah that's why I also proposed we pull in MIDI's pitch bend mechanism.

tone(60, ..); // example of playing normal c4
tone(60 | (pitch_bend << 8), ...); // example of playing a c4 with some pitch_bend value

JerwuQu commented 2 years ago

Ah, ignore that

rohlem commented 2 years ago

Just wanted to note that we could hide backwards-incompatible changes behind an opt-in system flag bit to retain old behaviour by default.

Since I'm already leaving a comment though, it might be interesting to know: The Gameboy soundchip exposes frequency by giving control over the (EDIT:) divisor: Frequency = 131072/(2048-x) Hz. That would be another way to give a higher density of low frequencies than high ones, though it's not quite log-scale. Arguably, logarithmic scale is more useful for traditional music, I was just intrigued by the simplistic and quirky nature of that approach too.

aduros commented 2 years ago

Thanks for the feedback, I agree this is an issue. Hopefully we can think of a solution that's intuitive, flexible, and doesn't break existing carts.

JerwuQu commented 10 months ago

Took a look at this again. I like how the PR is so incredibly simple.

We also have plenty of bits left in the tones flags parameter, which would enable more granular usage and not require a system parameter.

aduros / wasm4