Add more support for using Sinks as audio streams

Summary

Currently working with AudioData as a stream is unnecessarily difficult and the experience could be improved

What is the feature request for?

The core library

The Problem

There's 2 issues with using AudioData as a stream in current version of pycord.

Encoding

AudioData objects generated by Sinks do not have the proper .wav or .mp3 encoding (including valid byte prefixes) for the data to be read directly without doing the encoding by hand. Doing the encoding without saving to a file isn't a common use case, and as such it requires the developer to subclass existing audio manipulation libraries (like wav or pydub) to achieve that goal.

Library support

voice_client.start_recording() currently requires a coroutine callback as one of it's arguments. This means you not only have to pass a function even if you don't need one, but the function can't even be a lambda, since it is required that it is a coroutine.

vc.start_recording(
        sink,
        lambda x, y: x #will error out
)

The Ideal Solution

Encoding

Add a .read(user_id, starting_byte=0, encode=False) method to sinks (or AudioData) which returns encoded data of the underlying AudioData object like a stream would. Make encoding optional and otherwise match the selected sink (like .wav for WaveSink).

Library support

Make the callback argument optional, or at the very least allow for passing an empty lambda as the callback.

The Current Solution

Encoding

It is possible to subclass WaveSink & AudioData to add your own .read() method, and to subclass an audio manipulation library like wav or pydub to encode the audio stream without saving it to a file.

Library support

Unfortunately you can only create a dummy coroutine that does nothing, or pass a lambda and catch the function call in a try catch block.

Additional Context

I've implemented a rough solution over at wmetryka/pycord-voice-interface if you want to look at an example.

Hi, I saw this coincidentally while my friends and I are working on getting audio streaming to work via Pycord so I thought I'd drop my code here incase you or someone else is looking for a working solution. This is still very much in development so please excuse any silly # comments or weird structuring.

For implementing a custom Sink class, we went ahead and did this:

from discord.sinks.core import Filters, Sink, default_filters
from pydub import AudioSegment
from queue import Queue

class StreamSink(Sink):
    def __init__(self, *, filters=None):
        if filters is None:
            filters = default_filters
        self.filters = filters
        Filters.__init__(self, **self.filters)
        self.vc = None
        self.audio_data = {}

        # user id for parsing their specific audio data
        self.user_id = None
        self.buffer = StreamBuffer()

    def write(self, data, user):

        # if the data comes from the inviting user, we append it to buffer
        if user == self.user_id:
            self.buffer.write(data=data, user=user)

    def cleanup(self):
        self.finished = True

    def get_all_audio(self):
        # not applicable for streaming but may cause errors if not overloaded
        pass

    def get_user_audio(self, user):
        # not applicable for streaming but will def cause errors if not overloaded called
        pass

    def set_user(self, user_id: int):
        self.user_id = user_id
        print(f"Set user ID: {user_id}")

class StreamBuffer:
    def __init__(self) -> None:
        # holds byte-form audio data as it builds
        self.byte_buffer = bytearray()  # bytes
        self.segment_buffer = Queue()  # pydub.AudioSegments

        # audio data specifications
        self.sample_width = 2
        self.channels = 2
        self.sample_rate = 48000
        self.bytes_ps = 192000  # bytes added to buffer per second
        self.block_len = 2  # how long you want each audio block to be in seconds
        # min len to pull bytes from buffer
        self.buff_lim = self.bytes_ps * self.block_len

        # temp var for outputting audio
        self.ct = 1

    def write(self, data, user):

        self.byte_buffer += data  # data is a bytearray object
        # checking amount of data in the buffer
        if len(self.byte_buffer) > self.buff_lim:

            # grabbing slice from the buffer to work with
            byte_slice = self.byte_buffer[:self.buff_lim]

            # creating AudioSegment object with the slice
            audio_segment = AudioSegment(data=byte_slice,
                                         sample_width=self.sample_width,
                                         frame_rate=self.sample_rate,
                                         channels=self.channels,
                                         )

            # removing the old stinky trash data from buffer - ew get it out of there already
            self.byte_buffer = self.byte_buffer[self.buff_lim:]
            # ok much better now

            # adding AudioSegment to the queue
            self.segment_buffer.put(audio_segment)

            # temporary for validating process
            audio_segment.export(f"output{self.ct}.mp3", format="mp3")
            self.ct += 1

This, at the moment, will output a series of numbered 2-second mp3 files to the working directory. This is not the best solution but we did it just to validate that we can safely pull the live audio data. The end goal is to pass the audio data to some other processes without writing it to a file.

To get this working in a live bot, we went ahead and implemented the following:

import discord
from discord.ext import commands
from custom_core import StreamSink

intents = discord.Intents.all()
bot = commands.Bot(command_prefix='!', intents=intents)

connections = {}

stream_sink = StreamSink()

@bot.command()
async def record(ctx):  # if you're using commands.Bot, this will also work.
    voice = ctx.author.voice

    if not voice:
        # hehe
        await ctx.reply("You aren't in a voice channel, get your life together lmao")

    # connect to the voice channel the author is in.
    stream_sink.set_user(ctx.author.id)
    vc = await voice.channel.connect()
    # updating the cache with the guild and channel.
    connections.update({ctx.guild.id: vc})

    vc.start_recording(
        stream_sink,  # the sink type to use.
        once_done,  # what to do once done.
        ctx.channel  # the channel to disconnect from.
    )

    await ctx.reply("Started listening.")

# our voice client already passes these in.
async def once_done(sink: discord.sinks, channel: discord.TextChannel, *args):
    await sink.vc.disconnect()  # disconnect from the voice channel.
    print("Stopped listening.")

@bot.command()
async def stop_recording(ctx):
    if ctx.guild.id in connections:  # check if the guild is in the cache.
        vc = connections[ctx.guild.id]
        # stop recording, and call the callback (once_done).
        vc.stop_recording()
        del connections[ctx.guild.id]  # remove the guild from the cache.
    else:
        # respond with this if we aren't listening
        await ctx.reply("I am currently not listening here.")

# reading in token
key = open("key.pw", "r").read()

if __name__ == "__main__":
    bot.run(key)

As it stands, this bot will join a voice channel when invited and will receive only the voice of the person who invited it. This is specific to our use case but you can likely modify this to stack the received audio.

Again, this is still being developed and I will probably forget to update this so you can check to see if any changes were made here: ScruffyTheMoose/HushVC

Pycord-Development / pycord