Open wmetryka opened 1 year ago
Feel free to pr this feature if you have it already ready
Hi, I saw this coincidentally while my friends and I are working on getting audio streaming to work via Pycord so I thought I'd drop my code here incase you or someone else is looking for a working solution. This is still very much in development so please excuse any silly # comments or weird structuring.
For implementing a custom Sink class, we went ahead and did this:
from discord.sinks.core import Filters, Sink, default_filters
from pydub import AudioSegment
from queue import Queue
class StreamSink(Sink):
def __init__(self, *, filters=None):
if filters is None:
filters = default_filters
self.filters = filters
Filters.__init__(self, **self.filters)
self.vc = None
self.audio_data = {}
# user id for parsing their specific audio data
self.user_id = None
self.buffer = StreamBuffer()
def write(self, data, user):
# if the data comes from the inviting user, we append it to buffer
if user == self.user_id:
self.buffer.write(data=data, user=user)
def cleanup(self):
self.finished = True
def get_all_audio(self):
# not applicable for streaming but may cause errors if not overloaded
pass
def get_user_audio(self, user):
# not applicable for streaming but will def cause errors if not overloaded called
pass
def set_user(self, user_id: int):
self.user_id = user_id
print(f"Set user ID: {user_id}")
class StreamBuffer:
def __init__(self) -> None:
# holds byte-form audio data as it builds
self.byte_buffer = bytearray() # bytes
self.segment_buffer = Queue() # pydub.AudioSegments
# audio data specifications
self.sample_width = 2
self.channels = 2
self.sample_rate = 48000
self.bytes_ps = 192000 # bytes added to buffer per second
self.block_len = 2 # how long you want each audio block to be in seconds
# min len to pull bytes from buffer
self.buff_lim = self.bytes_ps * self.block_len
# temp var for outputting audio
self.ct = 1
def write(self, data, user):
self.byte_buffer += data # data is a bytearray object
# checking amount of data in the buffer
if len(self.byte_buffer) > self.buff_lim:
# grabbing slice from the buffer to work with
byte_slice = self.byte_buffer[:self.buff_lim]
# creating AudioSegment object with the slice
audio_segment = AudioSegment(data=byte_slice,
sample_width=self.sample_width,
frame_rate=self.sample_rate,
channels=self.channels,
)
# removing the old stinky trash data from buffer - ew get it out of there already
self.byte_buffer = self.byte_buffer[self.buff_lim:]
# ok much better now
# adding AudioSegment to the queue
self.segment_buffer.put(audio_segment)
# temporary for validating process
audio_segment.export(f"output{self.ct}.mp3", format="mp3")
self.ct += 1
This, at the moment, will output a series of numbered 2-second mp3 files to the working directory. This is not the best solution but we did it just to validate that we can safely pull the live audio data. The end goal is to pass the audio data to some other processes without writing it to a file.
To get this working in a live bot, we went ahead and implemented the following:
import discord
from discord.ext import commands
from custom_core import StreamSink
intents = discord.Intents.all()
bot = commands.Bot(command_prefix='!', intents=intents)
connections = {}
stream_sink = StreamSink()
@bot.command()
async def record(ctx): # if you're using commands.Bot, this will also work.
voice = ctx.author.voice
if not voice:
# hehe
await ctx.reply("You aren't in a voice channel, get your life together lmao")
# connect to the voice channel the author is in.
stream_sink.set_user(ctx.author.id)
vc = await voice.channel.connect()
# updating the cache with the guild and channel.
connections.update({ctx.guild.id: vc})
vc.start_recording(
stream_sink, # the sink type to use.
once_done, # what to do once done.
ctx.channel # the channel to disconnect from.
)
await ctx.reply("Started listening.")
# our voice client already passes these in.
async def once_done(sink: discord.sinks, channel: discord.TextChannel, *args):
await sink.vc.disconnect() # disconnect from the voice channel.
print("Stopped listening.")
@bot.command()
async def stop_recording(ctx):
if ctx.guild.id in connections: # check if the guild is in the cache.
vc = connections[ctx.guild.id]
# stop recording, and call the callback (once_done).
vc.stop_recording()
del connections[ctx.guild.id] # remove the guild from the cache.
else:
# respond with this if we aren't listening
await ctx.reply("I am currently not listening here.")
# reading in token
key = open("key.pw", "r").read()
if __name__ == "__main__":
bot.run(key)
As it stands, this bot will join a voice channel when invited and will receive only the voice of the person who invited it. This is specific to our use case but you can likely modify this to stack the received audio.
Again, this is still being developed and I will probably forget to update this so you can check to see if any changes were made here: ScruffyTheMoose/HushVC
Summary
Currently working with AudioData as a stream is unnecessarily difficult and the experience could be improved
What is the feature request for?
The core library
The Problem
There's 2 issues with using AudioData as a stream in current version of pycord.
Encoding
AudioData objects generated by Sinks do not have the proper .wav or .mp3 encoding (including valid byte prefixes) for the data to be read directly without doing the encoding by hand. Doing the encoding without saving to a file isn't a common use case, and as such it requires the developer to subclass existing audio manipulation libraries (like wav or pydub) to achieve that goal.
Library support
voice_client.start_recording() currently requires a coroutine callback as one of it's arguments. This means you not only have to pass a function even if you don't need one, but the function can't even be a lambda, since it is required that it is a coroutine.
The Ideal Solution
Encoding
Add a .read(user_id, starting_byte=0, encode=False) method to sinks (or AudioData) which returns encoded data of the underlying AudioData object like a stream would. Make encoding optional and otherwise match the selected sink (like .wav for WaveSink).
Library support
Make the callback argument optional, or at the very least allow for passing an empty lambda as the callback.
The Current Solution
Encoding
It is possible to subclass WaveSink & AudioData to add your own .read() method, and to subclass an audio manipulation library like wav or pydub to encode the audio stream without saving it to a file.
Library support
Unfortunately you can only create a dummy coroutine that does nothing, or pass a lambda and catch the function call in a try catch block.
Additional Context
I've implemented a rough solution over at wmetryka/pycord-voice-interface if you want to look at an example.