docs: Add initial proposal for V2 recording & playback API. (WIP)

microbit-carlos commented 1 year ago

Docs preview:

This initial proposal has been discussed in:

https://github.com/microbit-foundation/micropython-microbit-v2/issues/49

But we have some open question that will likely result and a rework of some of this.

Initial proposal

The initial proposal in this PR was to create a new AudioBuffer class to contain the audio data and sampling rate. The AudioBuffer.rate property could then be used by microphone.record() and audio.play() to configure recording and playback rates. This was done to avoid introducing a new parameter to audio.play() to configure the sampling rate, when it could only work with a single type of sound input (as it might not be possible to change the rate of the SoundExpressions or AudioFrames).

Disadvantages

However, changing the rate in a buffer type to change the playback rate in real-time is a bit awkward:

my_recording = audio.AudioBuffer(duration=5000, rate=5500)
microphone.record_into(my_recording)
audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    my_recording.rate = scale(x, (-1000, 1000), (2250, 11000))
    sleep(50)

An alternative we considered was to have the playback sampling rate modified via the audio module itself:

audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    audio.set_rate(scale(x, (-1000, 1000), (2250, 11000)))
    sleep(50)

However, this would have to set the same rate to everything played via the audio module, and Sound Expression have a different default rate (44K) than recordings (11K). So audio.set_rate(22000) should slow down Sound Expression and speed up recordings.

Alternatively, if we wanted to change the playback rate via the audio module, we could set a ratio instead. Something equivalent to audio.set_speed(100%) (with different semantics). But a disadvantage would be that it's removing some of math/physics learning opportunity to directly relate the sampling rate value with the effects that it has in playback speed.

Alternative proposal: bytearray as the buffer type

In this case a byte array would be returned by microphone.record() and used withmicrophone.record_into().

As this data type does not include info about the rate, we depend on the audio.play() adding an extra argument that might not work with other sound types like Sound Expressions and Audio Frames.

However, we still have the issue of updating the playback rate in real time during playback, which means we might would have to use use a similar approach to the previously mentioned audio.set_speed(100%):

sound_in_byte_array = microphone.record(duration=3000, rate=5500)
audio.play(sound_in_byte_array, rate=5500 wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    audio.set_speed(scale(x, (-1000, 1000), (50, 200)))
    sleep(50)

DURATION_SECONDS = 3
SAMPLE_RATE = 5500
recording = bytearray(DURATION_SECONDS * SAMPLE_RATE)
microphone.record_into(recording, rate=SAMPLE_RATE)
audio.play(recording, rate=SAMPLE_RATE)

Alternative proposal: AudioFrames as the buffer type

This would be the same as the bytearray proposal, but using the existing AudioFrames instead.

We might need to tweak the AudioFrame class to let us user larger buffers, as the default is 32 samples. As audio.play() can consume an iterable as well, we would need to figure out a good balance between AudioFrame size and number of AudioFrames in a recording buffer.

microbit-carlos commented 1 year ago

Based on the latest discussion we have agreed that we'd prefer to avoid introducing a new data type for his feature. In that case we have two options:

Use a byte array and change sampling via function in the audio module

sound_in_byte_array = microphone.record(duration=3000, rate=5500)
audio.play(sound_in_byte_array, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    playback_sampling_rate = scale(x, (-1000, 1000), (2_200, 11_000))
    audio.set_rate(playback_sampling_rate)
    sleep(50)

One previous suggestion was to create a function in the lines of audio.set_speed() that used a percentage value (0 to 100) as a the input range. This could solve the issue with different pipelines in the sound mixer having different playback sampling rates, however we believe it's important to use real numbers to be able to directly compare and understand how changing the sampling rate during recording vs playback affects sound.

To be able to implement something like audio.set_rate() we would probably need to ensure everything that is inputted to audio.play() uses the same default sampling rate. Right now these are the types of input audio.play() takes:

Sound effects via user-created audio.SoundEffect() instances
- CODAL SoundExpressions, default sampling rate: 44_100
Built-in sounds via microbit.Sound pre-generated instances
- CODAL SoundExpressions, default sampling rate: 44_100
audio.AudioFrame
- MicroPython data type, default sampling rate: 8_000?
  - Not sure about this one, the docs list as single frame of 32 samples to take a bit over 4ms.

We'd have to check how it'd affect sound quality, but we could consider decreasing the sampling rate for SoundExpressions to 11K. However, we would still have the isse that we won't be able to increase the default AudioFrame sampling rate as that would how old programmes sound.

While this would be the cleanest way to do this for the user API, I'm not seeing a way in which it can be achieved with the current requirements? Does anybody have any ideas to overcomes this issues?

Expand AudioFrame to include sampling rate

This option is similar to the original proposals about creating a new AudioBuffer data type, but instead we can expand AudioFrames to be able to have different sizes and different sampling rates.

By default they should still behave as they do in micro:bit V1 (and older MicroPython versions for V2), which is 32 samples at 8K (?) sampling rate.

But this could be changed via constructor parameters to have any size buffer at any sample rate.

Unfortunately, that still leaves us with the awkward case of changing playback sampling rate by changing a variable from the samples class, instead of a method in the audio module.

my_recording = audio.AudioFrames(length=11000, rate=5500)
my_recording = microphone.record_into(my_recording)
# Or
my_recording = microphone.record(duration=2000, rate=5500)

audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    my_recording.rate = scale(x, (-1000, 1000), (2250, 11000))
    sleep(50)

microbit-carlos commented 1 year ago

@dpgeorge the docs have been updated, let me know if something doesn't match our previous conversation.

microbit-carlos commented 1 year ago

@dpgeorge there are couple of issues related to setting the sampling rate, but shouldn't affect the MicroPython implementation and should be fixed (without changes in the API) in the next CODAL release: https://github.com/lancaster-university/codal-microbit-v2/issues?q=is%3Aopen+milestone%3Av0.2.60+label%3Ap0

microbit-carlos commented 2 months ago

@dpgeorge the docs have been updated with the conclusion from https://github.com/microbit-foundation/micropython-microbit-v2/issues/205#issuecomment-2221716206.

One thing I've changed that we haven't discussed before was to remove the rate argument from microphone.record_into(), as the rate can be set in the input AudioRecording/AudioTrack, and without the argument there isn't any ambiguity as to what takes precedence.

bbcmicrobit / micropython