RFC: Voice Receive API Design/Usage

imayhaveborkedit commented 6 years ago

As I have progressed through writing and redesigning this feature a few times, Danny and I have come to the conclusion regarding the inclusion of voice receive in discord.py. Discord considers voice receive a second class citizen as a feature and will likely never officially support or document it. With no such guarantees, all development is based on reverse engineering and is liable to be broken by discord at any point.

The conclusion is that voice receive as a discord bot feature does not belong in the library. However, the alternative is to simply use an extension module to implement it. See https://github.com/Rapptz/discord.py/pull/9288#issuecomment-1785942942 for more details.

This is exactly what I've been working on. https://github.com/imayhaveborkedit/discord-ext-voice-recv/

The foundational work has been largely complete and the code is functional, but as stated in the readme it's not quite complete, not guaranteed stable and subject to change. Basic documentation is done but more comprehensive docs and examples are on the todo list. It also requires v2.4 of discord.py (currently the master branch), not yet released on pypi at the time of writing this.

Old issue content

This information is technically outdated, but a large amount of the design still applies. --------------- ### Note: DO NOT use this in production. The code is messy (and possibly broken) and probably filled with debug prints. Use only with the intent to experiment or give feedback, although almost everything in the code is subject to change. Behold the voice receive RFC. This is where I ask for design suggestions and feedback. Unfortunately not many people seem to have any idea of what their ideal voice receive api would look like so it falls to me to come up with everything. Should anyone have any questions/comments/concerns/complaints/demands please post them here. I will be posting the tentative design components here for feedback and will update them occasionally. For more detailed information on my progress see [the project on my fork](https://github.com/imayhaveborkedit/discord.py/projects/1). I will also be adding an example soonish. ## Overview The main concept behind my voice receive design is to mirror the voice send api as much as possible. However, due to receive being more complex than send, I've had to take some liberties in creating some new concepts and functionality for the more complex parts. The basic usage should be relatively familiar: ```py vc = await channel.connect() vc.listen(MySink()) ``` The voice send api calls an object that produces PCM packets a `Source`, whereas the receive api refers to them as a `Sink`. Sources have a `read()` function that produces PCM packets, so Sinks have a `write(data)` function that does something with PCM packets. Sinks can also optionally accept opus data to bypass the decoding stage if you so desire. The signature of the `write(data)` function is currently just a payload blob with the opus data, pcm data, and rtp packet, mostly for my own convenience during development. This is subject to change later on. The new VoiceClient functions are basically the same as the send variants, with `listen()` being the new counterpart to `play()`. >Note: The `stop()` function has been changed to stop both playing **and** listening. I have added `stop_playing()` and `stop_listening()` for individual control. ## Built in Sinks For simply saving voice data to a file, you can use the built in `WaveSink` to write them to a wav file. The way I have this currently implemented, however, is completely broken for more than one user. >Note: Here lies my biggest problem. I currently do not have any way to combine multiple voice "streams" into one stream. The way this works is Discord sends packets for all users on the same socket, differentiated by an id (aka ssrc, from RTP spec). These packets have timestamps, but with a random start offset, per ssrc. RTP has a mechanism where the reference time is sent in a control packet, but as far as I can tell, Discord *doesn't send these control packets*. As such, I have no way of properly synchronizing streams without excessive guesswork based on arrival time in the socket (unreliable at best). Until I can solve this there will be a few holes in the design, for example, how to record the whole conversation in a voice channel instead of individual users. Sinks can be composed much like Sources can (PCMVolumeTransformer+FFmpegPCMAudio, etc). I will have some built in sinks for handling various control actions, such as filtering by user or predicate. ```py # only listen to message.author vc.listen(UserFilter(MySink(), message.author)) # listen for 10 seconds vc.listen(TimedFilter(MySink(), 10)) # arbitrary predicate, could check flags, permissions, etc vc.listen(ConditionalFilter(MySink(), lambda data: ...)) ``` and so forth. As usual, these are subject to change when I go over this part of the design again. >As mentioned before, mixing is still my largest unsolved problem. Combining all voice data in a channel into one stream is surely a common use case, and i'll do my best to try and figure out a solution, but I can't promise anything yet. If it turns out that my solution is too hacky, I might have to put it in some ext package on pypi (see: ext.colors). For volume control, I recently found that libopus has a gain setting in the decoder. This is probably faster and more accurate than altering pcm packets after they've been decoded. Unfortunately, I haven't quite figured out how to expose this setting yet, so I don't have any public api to show for it. That should account for most of the public api part that i've designed so far. I still have a lot of miscellaneous things to do so no ETA. Again, if you have any feedback whatsoever please make yourself known either here or in the discord server.

Ruuttu commented 6 years ago

My thoughts. As a starting point, yes, the API should provide separate PCM chunks for each member being listened to. If no filters are set, all members are listened to, including those joining the call after listening began.

To decode PCM properly, the library needs to put packets into order and identify lost packets. The packets have an incrementing sequence number that will be used. This all implies buffering and some error handling, such as filling in lost data (with opus) and simply discarding packets if they are received too late.

Each audio chunk gets a number specifying its position (in milliseconds) in relation to when the listening session began. These chunks can then be fed into a mixer to produce a single stream if desired. For simplicity, all chunks should be assigned into perfect 20ms slots (e.g. 40 and 60, not 36 and 51).

No special timing information should be necessary. Record the time whenever someone starts speaking; every following chunk can be placed exactly 20ms after the previous one. There will be five silent packets to signify they've stopped speaking. In case those are not received there should also be a timeout.

If enough latency is added, the library should be able to give a reliable output under most conditions. With minimal latency, there will be more issues. There's no free lunch :)

imayhaveborkedit commented 6 years ago

I've already had a go at writing all of the decoding and processing for this. I spent a lot of time writing classes for packet types discord doesn't send. The timestamp data in RTP packets starts from a random offset, and that offset is different per ssrc. RTCP packets have the reference time for calculating when these packets were created. Relying on gateway websocket events to sync up voice socket UDP packets is racy and not reliable, especially considering the gateway code can be blocked by user code and speaking events have no id (timestamp).

Ruuttu commented 6 years ago

Seems safe to bet that Discord makes no effort to synchronize speakers. Each stream is basically played "as soon as it's received" with the shortest buffer they think they can get away with.

If there's someone in the call from New Zealand and we're always making fun of him for laughing at jokes three seconds too late, I would expect the data coming from the Voice Receive API to reflect that. I would not expect it to "fix" the latency.

So I think you can track the local packet receive times and use some of those without shame. Again, most packets can actually be placed right after the previous one, ignoring the receive time. Never the first packet after silence tho, of course.

wgaylord commented 5 years ago

Going to assume that this attempt has stalled?

Any idea on if anyone will try and implement it?

Also a good api would just pass in the raw PCM that comes out of the opus library. Must like the mumble python api. (Another chat server that uses opus)

This should always be the same length unless discord is changing the opus config options on the fly.

imayhaveborkedit commented 5 years ago

I'm still working on it. The fact that this is an "unsupported" feature and my desire to make a sufficiently high level api for ease of use coupled with my sporadic motivation and inspiration make for slow progress. It also seems that few people have useful input on this issue meaning that I am basically on my own for the most part. If you want to see what I have so far I keep my fork updated. There's still a lot to do and I haven't written docs or added an example yet so make of it what you will. https://github.com/imayhaveborkedit/discord.py

wgaylord commented 5 years ago

Welp guess I am stuck on using my own server with mumble for my HamRadio Remote station.

Apfelin commented 5 years ago

I see this fork is still getting regular updates.

You've mentioned a use-case, where it's possible to use a WaveSink to get raw PCM/Wav data. As it happens this is exactly the kind of thing I need.

Basically, my use-case is simply getting raw PCM data and passing it along to do some basic speech recognition (fwiw, a very basic speech-to-text bot for a deaf friend). I assume no audio mixing means there is no way to actually tell which packet comes from which user? Regardless, such a situation doesn't impact me greatly, since, assuming the (in my case, 2 or 3) users speak in a somewhat orderly fashion, my output would simply be the transcript of what has been said, no matter who said it. The only other thing needed in this case, for me, would be a way to detect when a user has stopped speaking so I can pass along the file/buffer without words being cut out. Ideally, this could be done by storing the audio data in a buffer, but writing to a file and using that as input would work fairly good, as well.

Would it be possible to share a minimal piece of code that exemplifies the use-case you described, or provide any hints for the direction I should take in my implementation of this use-case?

imayhaveborkedit commented 5 years ago

Don't worry, that use case is probably one of the two major cases that I expect people to have. WaveSink is specifically for writing data to a wav file, the point being that the built in wave module takes care of writing all the headers and such. The data you get in the first place is already PCM, so unless whatever flow you have requires a file on the filesystem, you don't need that one.

When I mention "mixing" I'm referring to combining the various streams of user voice data into a single combined stream. This is a problem I haven't quite figured out how to do properly yet since discord doesn't seem to provide the required RTCP packets necessary to synchronize the streams. If I do come up with something and it ends up being too jank to be in the main lib I'm considering making some sort of ext.voice package on pypi. Anyways, these "streams" are per user (actually ssrc, which is just an id in the RTP spec, but are mapped to user ids), so the data you get will include a member object. The exact format of this I haven't decided on yet, so right now it's just sort of a payload blob object with the pcm, opus, and rtppacket object (mostly for my own convenience during development).

Delimiting speech segments is still something I don't quite know how to handle yet. I think this might be a problem I have to put onto the user since I don't see a good way to do it lib-side. In the example I'm writing for this I'm thinking about setting the pattern for doing so to use the priority speaking feature. Relying on speaking events and or arbitrary waiting periods does not sound reliable enough to use by default. Using priority speaking to indicate the segment of speech to be recognized/processed would be very convenient for both me since I don't need to do anything in the lib for it and for the user since being a PTT feature means if they mess it up it's their fault.

Unfortunately have speech recognition in an example is a bit out of scope for the lib examples, but I plan on having an additional example in a gist that demonstrates this using the SpeechRecognition module most likely. In your case, until I design out how it would work with the various services in the aforementioned library (which might end up being the same anyways), it would probably be waiting for priority speaking from some member, collecting their voice data in your container of choice (in memory or filesystem based), and processing them once priority speaking ends.

imayhaveborkedit commented 5 years ago

I have updated the OP. Anyone vaguely interested in this feature should read the new content.

Apfelin commented 5 years ago

Got around to writing a short example this weekend. It's written in a rush, it's probably not how discord.py is meant to be used, but it works.

As mentioned in the OP, I don't know if the mixing works for this example. I've used a UserFilter, but I haven't yet tested with more users, to see if it actually filters. I managed to do speech-to-text by just coarsely segmenting data into 5 second chunks, since I don't think anyone speaks in long-winded sentences. Or at least, I don't. The accuracy is determined by the speech recognition service, but in general, it's pretty decent. A downside to this segmenting approach is that it only processes/posts the resulting text every 5 seconds. For speech to text, it's not ideal, but this could work reasonably well for some sort of voice commands. Another issue is that, sometimes, the 5 second segment might get only part of your sentence, but this is somewhat mitigated by stripping leading zeros in the buffer.

I faintly remember someone mentioning silence is marked by 5 chunks of 0x00, so I've been trying to implement a way to delimit speech (or rather, words) by looking for these chunks, but I haven't found a reliable way to do this yet. I've been looking over raw bytes output, seeing if this theory holds up, and it seems like it might, but I'd probably have to apply some sort of regex to make sure there really aren't any chunks when speaking occurs.

FWIW, here's the gist: https://gist.github.com/Apfelin/c9cbb7988a9d8e55d77b06473b72dd57

gamescom15 commented 5 years ago

Looks great, but I keep getting an error on line 12

This is not yet implemented in the main library. @Apfelin was using imayhaveborkedit's fork, which does have discord.reader.AudioSink

sillyfrog commented 5 years ago

@imayhaveborkedit Further to your proposed API, I have implemented a listener bot (that saves stuff to disk) in discord.js (I'm hoping to move it to Python ultimately). This leaves things such as mixing and figuring out when to join streams up to the user - which I think is the right thing to do as everyone will want something different. (Providing a separate lib for the common use cases makes sense, such as mixing users - I think it should be out of scope of this.)

The implementation in JS involves listening for the "speaking" event, then binding a receiver/sink (stream) to it to accept the data. When binding the receiver you can select the mode (eg: PCM, or Opus. Wav also makes sense since Python has that built in). Then every chunk is written to the receiver as it comes in (the actual implementation for out of order packets etc is unclear to me).

When the user stops talking/they release PTT, the end event is triggered, and the stream is closed.

I would argue, for the moment, the filters per user etc are not required, rather, in the on_speaking event, the user can decide if the want to save this stream or not (the event is given the member details), and they could return the stream to write to (or call a method on a passed object), if no stream is returned, no action is taken (the overhead of this would be minimal compared to everything else that has to happen to stream data). Again, some common classes could be provided to simplify the process (eg: a stream to file class).

I know I'm coming late to the party here and may have missed some stuff (I have not yet used this, but trying to figure it out), but hopefully that makes some sense.

I'm sure you have see this, but just in case, this is the VoiceReceiver API for discord.js: https://discord.js.org/#/docs/main/master/class/VoiceReceiver (not much there), and an example of the Voice API in use (slightly out of date, but gives you a feel for it as the discord.js docs are not great): https://gist.github.com/eslachance/fb70fc036183b7974d3b9191601846ba

imayhaveborkedit commented 5 years ago

I have updated the OP again with new info about stopping sinks. (I guess I didn't...?)

@sillyfrog Sorry for not responding until now. The whole mixing thing if I do figure it out will still be entirely optional. It would exist as a batteries-included utility. If a user wants to handle the data differently of course they can go about it their own way. The problem is that this is not an easy thing to do, even less so doing it correctly. I honestly don't expect many people to be able to come up with a decent solution to this that involves mixing the data live. I still believe that getting the combined audio of everyone speaking in a channel is a common and valid use case and as such a utility for doing so should be included in the lib.

The d.js example vaguely follows the concept I had in mind for this but I would design a somewhat higher level interface for it. Perhaps with a context manager. Or maybe I wont and just do it "manually" in the example to set the precedent. Or maybe just leave it as an exercise to the user.

0xBERNDOG commented 5 years ago

I've used d.js in the past and initially found it annoying to have to deal with individual user audiostreams, so +1 to a simple vc.listen(AudioSink()) function.

What's currently blocking your progress on this? I'm trying to bring my mental model of the current problems up to speed so that I can hopefully contribute.

0xBERNDOG commented 5 years ago

re: Sinks and filters

Source and Sink are basically IO streams, with a filter being an in-memory sink (i.e. doesn't write out to a file). To compose filters, for example, could be done in essentially a list or a linked list where a call to write() at the head will propagate the data down the chain recursively. Example usage:

wavSink = WavSink(filename='output_file.wav')
volumeFilter = VolumeFilter()
someOtherFilter1 = SomeFilter1()
someOtherFilter2 = SomeFilter2()

composedFilter = volumeFilter.compose([someOtherFilter1, someOtherFilter2])
# or maybe
# composedFilter = BlankFilter([volumeFilter, someOtherFilter1, someOtherFilter2])
wavSink.filter = composedFilter

vc.listen(wavSink)

Writes would propagate like so:

wavSink.write(data)
volumeFilter.write(data)
someOtherFilter1.write(data) 
someOtherFilter2.write(data)
-----------------------------------------------
> someOtherFilter2 modifies data and returns it
> someOtherFilter1 modifies data and returns it
> volumeFilter modifies data and returns it
> wavSink writes data to file

imayhaveborkedit commented 4 years ago

To be honest, this is horrifying. Voice is already threaded. You don't need a thread for state. You don't need two different instances of the sink. This is not how this class is used. It's clear that I need extensive examples to try to prevent people from writing code like this.

For reference, this is the typical usage pattern:

vc.listen(discord.UserFilter(discord.WaveSink('file.wav'), some_member))
...
vc.stop_listening()

That's it. Note that the Filter objects are probably going to be changed at some point since I don't like the design very much in its current state.

re-dude69 commented 4 years ago

Is it possible to send audio from system microphone with this?

And can i send audio from vc.listen() to system speakers in real time?

apple502j commented 4 years ago

@brownbananas95 Sending audio is already implemented. I think you can but i'm unsure

gerth2 commented 4 years ago

@brownbananas95 sending (mike->discord) works without issue. Receiving (Discord->speakers) is a bit more complex, as you currently receive each user as a separate stream. You can receive these, but must mix them prior to sending to the computer's speakers.

re-dude69 commented 4 years ago

@gerth2 do you have a code example for sending mike->discord?

gerth2 commented 4 years ago

Sure thing - I yanked the meaningful guts from a project I have in-flight at the moment using this Voice Receive fork of discord.py:

https://gist.github.com/gerth2/8ee0c918606b4c501759a9c333393398

Let me know if you run into issues, I can zip up a latest copy of what we're working on to send to you.

re-dude69 commented 4 years ago

Exactly what I needed! Thank you!

gerth2 commented 4 years ago

FWIW, yesterday, I got a crude but (apparently?) functional PCM audio mixing strategy sorted out, outside of this library, but using its API's. Working on open-sourcing the code some time this week (still has hardcoded private API keys in it, need to fix).

Since this is an RFC, my comment: I like the API as provided - the alignment with the existing read side feels nice and fuzzy. Any issues with internals aside, it looks nice and seems to work fine from the outside.

re-dude69 commented 4 years ago

Let me know if you run into issues, I can zip up a latest copy of what we're working on to send to you.

What is your contact information?

FWIW, yesterday, I got a crude but (apparently?) functional PCM audio mixing strategy sorted out, outside of this library, but using its API's. Working on open-sourcing the code some time this week (still has hardcoded private API keys in it, need to fix).

Since this is an RFC, my comment: I like the API as provided - the alignment with the existing read side feels nice and fuzzy. Any issues with internals aside, it looks nice and seems to work fine from the outside.

This is interesting. Perhaps this would help complete the Voice Receive fork by imayhaveborkedit, and potentially get the PR approved in discord.py/master :-)

gerth2 commented 4 years ago

@brownbananas95 code is here

DMcP89 commented 4 years ago

Is this still being considered as a feature or will it only exist in forks?

JessicaTegner commented 4 years ago

What is the status of this RFC? Any timeframe as to when this will be ready to be integrated into master, considering the fork, which seem to have this working?

ChonkyWonky commented 4 years ago

Maybe writing a function that mutes all other users except a user it wants to specifically listen to from the perspective of the bot would work? Since discord takes it all as one stream? There's no point listening to multiple people at the same time unless you want to record it, or create a gateway to ps party for example. surely?

gerth2 commented 3 years ago

FWIW I believe a recent discord API change makes this particular PR out of date - changes will need ported forward to function.

NormHarrison commented 3 years ago

@gerth2 is correct, in particular it's this pretty simple change that is needed to get this variant to work again, and nothing else thankfully. I was just able to start utilizing imayhaveborkedit's excellent work on this to accomplish something I was trying to do which required the ability for the bot to receive audio from a user.

On that note, I have to thank you tremendously gerth2 for sharing your work and providing a good example of how to use the features offered by this fork, it helped me achieve what I was working on basically 100%, not sure if I could of done it without them, at least not near as easily. Additionally, like what gerth said here, this really feels almost 100% in-sync with the main voice send part of the existing library, to the point I was almost able to simply reverse the process of sending audio I already had setup, and have it work for receiving.

I think you've done a lot more than you take credit for @imayhaveborkedit lol.

gerth2 commented 3 years ago

@NormHarrison Thanks, yea, I should have provided what we did to fix it, but wasn't entirely confident it was done right. Turns out we made the same change which makes me feel better that I wasn't totally off the in the weeds. This stackoverflow post was what got us going in the first place, and matched the errors in the logs.

FYI: the trigger that something was wrong was a notification to me from discord saying they had reset my bot's token due to 1000+ reconnects in a short time - apparently our 10 second delay between crash & restart was enough to get flagged in their system.

But.... not that this is a great way to test at all, but we've been running that bot above on one raspberry pi for about a year now. The above fix was the only real issue we had related to this library. Otherwise, audio's been pretty rock solid.

Maybe if this branch were to be merged to master, just mark the new audio methods as "experimental" or something like that and get some broader usage samples?

imayhaveborkedit commented 3 years ago

I've neglected this issue a bit for a while due to various reasons but I have finally come to the conclusion that I had wanted to avoid. Discord's "unsupported" voice recv api, along with the increasingly disconcerting push to get bots to embrace a \~stateless\~ design, the high level api I've been wanting to create might be less feasible than I had once thought.

Do not worry though, not all hope is lost.

Since Danny has decided to redesign voice send with a different style allowing for custom voice management protocols (lavalink etc), I've decided to (once again) redesign the voice recv api and remodel it with the new design. Additionally, the size of a completed high level voice recv api would have likely been too large for anyone (Danny) to want to review. This, along with the aforementioned state and stability issues in mind, I've decided to attempt to progress incrementally. I'll start with the basic features: speaking state and a raw event. This minimal design should be sufficient for anyone capable enough to use the voice recv feature in a meaningful way. After I can ensure a stable design and functionality for additional parts will these be added in later patches. Of course this doesn't stop anyone from writing their own for their own purposes in the meantime obviously. These experimental parts will live in either my own fork or an ext voice extras package I'll manage. This will afford me better conditions to develop and release them before they're ready to be added to the main library with Danny's approval.

brucdarc commented 3 years ago

@imayhaveborkedit So I've been messing with your current branch for the past couple days. Do you think Danny would accept a version that limits the functionallity to only listen to one user at once? It seems that the javascript version of this project's voice listening does not allow listening to all users at once for the same reason you found of not being able to combine users streams nicely. If the functionality was limited to listening to one user at once, it would be easier to hide the guts of the program from api users and abstractify it right? The amount of work you've done on this project is pretty awesome.

imayhaveborkedit commented 3 years ago

@brucdarc It doesn't quite work like that. All the data from every user comes through the same socket. You can't specifically listen to anyone, its all or nothing. The way that djs function is designed is an artificial limitation, probably for UX purposes. Its not impossible to mux streams together in theory, but its far from simple to do it right. If you don't care about consistency you can just yolo it and hope everything just works (this kind of muxer is something i will eventually experiment with at some point).

I don't intend to write this functionality in specifically (aside from something like one of the sinks i mentioned above) or exclusively, it'll be up to the user to decide who and how they want to listen to people. This function isn't exactly complicated either, its just filtering out data from everyone except a specific user (essentially an if check in an event).

sairam4123 commented 3 years ago

Is it being worked on? I tried to do something like this.

async def test_command(ctx: commands.Context):
    vc: discord.VoiceClient = await ctx.author.voice.channel.connect()
    vc.listen(discord.WaveSink('waves/wave.wav'))
    await asyncio.sleep(10)
    vc.stop_listening()
    _file_ = open('waves/wave.wav', 'rb')
    await ctx.send('Here\'s, your record file.', file=discord.File(_file_, filename='record.wav'))

but this didn't work Am I doing something wrong?

Edit: I am just trying out, it is not for production.

sairam4123 commented 3 years ago

Well, I am getting OpusNotLoaded error, when I tried to merge this with v1.3.4 as I got deny_new issue which was fixed in v1.3.4.
How should I fix OpusNotLoaded issue I am getting right now.

Here's the code I used for running the bot:

import asyncio
from pathlib import Path

import discord
from discord.ext import commands

# discord.opus.load_opus('libopus-0.x64.dll')
bot = commands.Bot('.')

@bot.event
async def on_ready():
    print('Running bot')
    print(bot.user.id)
    print(bot.user)

@bot.command()
async def test_command(ctx):
    vc: discord.VoiceClient = await ctx.author.voice.channel.connect()
    (Path.cwd() / 'waves').mkdir(exist_ok=True)
    (Path.cwd() / 'waves/wave.wav').touch(exist_ok=True)
    fp = (Path.cwd() / 'waves/wave.wav').open('rb')
    vc.listen(discord.WaveSink('waves/wave.wav'))
    await asyncio.sleep(10)
    vc.stop_listening()
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename='record.wav'))

bot.run('TOKEN_REMOVED_FOR_SECURITY_PURPOSE')

The error I am getting right now:

Traceback (most recent call last):
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\reader.py", line 377, in run
    self._do_run()
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\reader.py", line 370, in _do_run
    self.decoder.feed_rtp(packet)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 872, in feed_rtp
    dec = self._get_decoder(packet.ssrc)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 859, in _get_decoder
    dec = self.decoders[ssrc] = self.decodercls(ssrc)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 687, in __init__
    self._decoder = Decoder()
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 336, in __init__
    raise OpusNotLoaded()
discord.opus.OpusNotLoaded

Thank you.

nicolaipre commented 3 years ago

Well, I am getting OpusNotLoaded error, when I tried to merge this with v1.3.4 as I got deny_new issue which was fixed in v1.3.4. How should I fix OpusNotLoaded issue I am getting right now.

Here's the code I used for running the bot:

import asyncio
from pathlib import Path

import discord
from discord.ext import commands

# discord.opus.load_opus('libopus-0.x64.dll')
bot = commands.Bot('.')

@bot.event
async def on_ready():
    print('Running bot')
    print(bot.user.id)
    print(bot.user)

@bot.command()
async def test_command(ctx):
    vc: discord.VoiceClient = await ctx.author.voice.channel.connect()
    (Path.cwd() / 'waves').mkdir(exist_ok=True)
    (Path.cwd() / 'waves/wave.wav').touch(exist_ok=True)
    fp = (Path.cwd() / 'waves/wave.wav').open('rb')
    vc.listen(discord.WaveSink('waves/wave.wav'))
    await asyncio.sleep(10)
    vc.stop_listening()
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename='record.wav'))

bot.run('TOKEN_REMOVED_FOR_SECURITY_PURPOSE')

The error I am getting right now:

Traceback (most recent call last):
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\reader.py", line 377, in run
    self._do_run()
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\reader.py", line 370, in _do_run
    self.decoder.feed_rtp(packet)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 872, in feed_rtp
    dec = self._get_decoder(packet.ssrc)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 859, in _get_decoder
    dec = self.decoders[ssrc] = self.decodercls(ssrc)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 687, in __init__
    self._decoder = Decoder()
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 336, in __init__
    raise OpusNotLoaded()
discord.opus.OpusNotLoaded

Thank you.

Try this, it works on Linux. Not sure for Windows though.

# Fix Discord Opus error
import ctypes
import ctypes.util
discord.opus.load_opus(ctypes.util.find_library('opus'))
discord.opus.is_loaded()

sairam4123 commented 3 years ago

Traceback (most recent call last):
  File "F:/PyCharm Python Works/Discord-Voice-Recv/main.py", line 8, in <module>
    discord.opus.load_opus(ctypes.util.find_library('opus'))
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 231, in load_opus
    _lib = libopus_loader(name)
  File "c:\users\kanna\appdata\local\pypoetry\cache\virtualenvs\discord-voice-recv-glolxjdo-py3.8\src\discord.py\discord\opus.py", line 157, in libopus_loader
    lib = ctypes.cdll.LoadLibrary(name)
  File "f:\python 3.8.2 (x64)\lib\ctypes\__init__.py", line 451, in LoadLibrary
    return self._dlltype(name)
  File "f:\python 3.8.2 (x64)\lib\ctypes\__init__.py", line 363, in __init__
    if '/' in name or '\\' in name:
TypeError: argument of type 'NoneType' is not iterable

I don't have Opus? How to install Opus for Windows? @nicolaipre

ctypes.util.find_library('opus') = None

Tried to print and it was None


print(f"{ctypes.util.find_library('opus') = }")

sairam4123 commented 3 years ago

Anyway this worked.

discord.opus.load_opus(r"F:\PyCharm Python Works\Discord-Voice-Recv\waves\libopus-0.x64.dll")
print(discord.opus.is_loaded())

I was ordered to use full path, this might not work in all computers tho, I change somethings.

I have made a small command using this API.

@bot.command()
async def record(ctx: commands.Context, time: FutureTime, me_only: bool):
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    (Path.cwd() / 'waves').mkdir(exist_ok=True)
    (Path.cwd() / 'waves/wave.wav').touch(exist_ok=True)
    fp = (Path.cwd() / 'waves/wave.wav').open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink('waves/wave.wav'), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink('waves/wave.wav'))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    # print(discord.File(fp, filename='record.wav'))
    await ctx.send("Recording being sent. Please wait!")
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename='record.wav'))

I am linking this for users who comes here for finding a record command.

Is this normal?

[router:push] Warning: rtp heap size has grown to 20
[router:push] Warning: rtp heap size has grown to 25
[router:push] Warning: rtp heap size has grown to 30
[router:push] Warning: rtp heap size has grown to 35
[router:push] Warning: rtp heap size has grown to 40
[router:push] Warning: rtp heap size has grown to 45
[router:push] Warning: rtp heap size has grown to 50
[router:push] Warning: rtp heap size has grown to 55
[router:push] Warning: rtp heap size has grown to 60
[router:push] Warning: rtp heap size has grown to 65
[router:push] Warning: rtp heap size has grown to 70
[router:push] Warning: rtp heap size has grown to 75
[router:push] Warning: rtp heap size has grown to 80
[router:push] Warning: rtp heap size has grown to 85
[router:push] Warning: rtp heap size has grown to 90
[router:push] Warning: rtp heap size has grown to 95
[router:push] Warning: rtp heap size has grown to 100
[router:push] Warning: rtp heap size has grown to 105
[router:push] Warning: rtp heap size has grown to 110
[router:push] Warning: rtp heap size has grown to 115
[router:push] Warning: rtp heap size has grown to 120
[router:push] Warning: rtp heap size has grown to 125
[router:push] Warning: rtp heap size has grown to 130
[router:push] Warning: rtp heap size has grown to 135
[router:push] Warning: rtp heap size has grown to 140
[router:push] Warning: rtp heap size has grown to 145
[router:push] Warning: rtp heap size has grown to 150
[router:push] Warning: rtp heap size has grown to 155
[router:push] Warning: rtp heap size has grown to 160
[router:push] Warning: rtp heap size has grown to 165
[router:push] Warning: rtp heap size has grown to 170
[router:push] Warning: rtp heap size has grown to 175

NormHarrison commented 3 years ago

@sairam4123 Yes, that would be considered normal taking into account that this is a development library. A lot of the extra debugging information printed to the screen is still present in the code as Imayhaveborkedit originally mentioned. I personally encountered that warning right when additional users started speaking and/or (if I recall correctly) when voice data wasn't being read from Discord for extended periods of time, which makes sense given what the warning is mentioning.

If you wanted to, I'm sure you could search for where that message is printed within the libraries multiple files and simply comment it out, but if it's being printed excessively, then there might be something else to look into.

Also, not to come across as rude, and I really don't have say over this, but I'm not sure that it is best to start more casual conversation/help related discussion in this thread, since it was originally meant as a RFC.

sairam4123 commented 3 years ago

Well, It is being printed excessively, like 1000 times. It was printing 20-30 times before, and I didn't cared, as I knew that the lib has debugging information. But now it is printing like 1000 times, and I got scared and posted here. Anyway. @NormHarrison

sairam4123 commented 3 years ago

I would like to provide some examples I made using this API.

A record command which records user's voice from API.

@bot.command()
async def record(ctx: commands.Context, time: FutureTime, me_only: bool):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    wave_file = waves_folder / waves_file_format.format(number)
    wave_file.touch()
    fp = wave_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(wave_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(wave_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    # print(discord.File(fp, filename='record.wav'))
    await ctx.send("Recording being sent. Please wait!")
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename=str(wave_file.name)))
    number += 1

The command is not that great, but still fine. I'll keep this updated as much as possible. Next one. Uses gTTS and pydub. A Text to Speech command.

@bot.command(aliases=['tts'])
async def text_to_speech(ctx: commands.Context, lang: Optional[str] = None, *, message: str):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    tts_file = tts_folder / tts_file_format.format(ctx.author, number)
    gtts.gTTS(message, lang=lang).save(str(tts_file))
    tts_file_wav = tts_file.with_suffix('.wav')
    pydub.AudioSegment.from_mp3(tts_file).export(tts_file_wav, format='wav')

    if not ctx.voice_client.is_playing():
        ctx.voice_client.play(discord.FFmpegPCMAudio(str(tts_file_wav)))
    number += 1

Not that great, I'm still, figuring out a better way to do this, any help welcome. Anyway, next one is Speech to Text, Uses SpeechRecognition.

@bot.command(aliases=['stt'])
async def speech_to_text(ctx: commands.Context, time: FutureTime, me_only: bool = True):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    sr_file = sr_folder / sr_file_format.format(ctx.author, number)
    sr_file.touch()
    fp = sr_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(sr_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(sr_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    await ctx.send("Recognizing your voice, please wait!")
    recognizer = speech_recognition.Recognizer()
    with speech_recognition.AudioFile(fp) as source:
        sr_audio_data = recognizer.record(source)
    # print(recognizer.recognize_google(sr_audio_data, language='en-US'))
    await ctx.send("I think this is right, maybe, \n Here's your Speech-To-Text \n > " + recognizer.recognize_google(sr_audio_data, language='en-US'))
    number += 1

Well, recognition is bad, I'll provide some kinds of Pictures, which shows the commands usage. And well, those are incomplete, let me provide you the some extra codes which makes all 3 commands work correctly without changing a single line of code.

number_txt_file = Path.cwd() / 'number.txt'
number_txt_file.touch(exist_ok=True)
number = int(number_txt_file.open('r').read() or 0)
waves_folder = (Path.cwd() / 'recordings')
waves_file_format = "recording{}.wav"
waves_folder.mkdir(parents=True, exist_ok=True)
tts_folder = (Path.cwd() / 'tts')
tts_folder.mkdir(parents=True, exist_ok=True)
tts_file_format = "tts{}{}.mp3"
sr_folder = (Path.cwd() / 'sr')
sr_folder.mkdir(parents=True, exist_ok=True)
sr_file_format = "sr{}{}.wav"

This is to make the dirs and files, to save the recordings, the recording will be deleted when there are 10+ recording files. To save some space and to abide the privacy of users.

@tasks.loop(seconds=4)
async def save_number_loop():
    global number
    with number_txt_file.open('w') as fp:
        fp.write(str(number))
    if len(list(waves_folder.iterdir())) > 10:
        print("Deleting recording files as the recording file's count got above 10.")
        for item in waves_folder.iterdir():
            # print(item)
            item.unlink()
        number = 0

This deletes the recordings and writes the number as well. To save the number if the bot is restarted.

I might write a Voice Chat bot, if I figured out how to check if the user has stopped speaking and the voice data he spoke.

Vyprath commented 3 years ago

I would like to provide some examples I made using this API.

A record command which records user's voice from API.

@bot.command()
async def record(ctx: commands.Context, time: FutureTime, me_only: bool):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    wave_file = waves_folder / waves_file_format.format(number)
    wave_file.touch()
    fp = wave_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(wave_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(wave_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    # print(discord.File(fp, filename='record.wav'))
    await ctx.send("Recording being sent. Please wait!")
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename=str(wave_file.name)))
    number += 1

The command is not that great, but still fine. I'll keep this updated as much as possible. Next one. Uses gTTS and pydub. A Text to Speech command.

@bot.command(aliases=['tts'])
async def text_to_speech(ctx: commands.Context, lang: Optional[str] = None, *, message: str):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    tts_file = tts_folder / tts_file_format.format(ctx.author, number)
    gtts.gTTS(message, lang=lang).save(str(tts_file))
    tts_file_wav = tts_file.with_suffix('.wav')
    pydub.AudioSegment.from_mp3(tts_file).export(tts_file_wav, format='wav')

    if not ctx.voice_client.is_playing():
        ctx.voice_client.play(discord.FFmpegPCMAudio(str(tts_file_wav)))
    number += 1

Not that great, I'm still, figuring out a better way to do this, any help welcome. Anyway, next one is Speech to Text, Uses SpeechRecognition.

@bot.command(aliases=['stt'])
async def speech_to_text(ctx: commands.Context, time: FutureTime, me_only: bool = True):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    sr_file = sr_folder / sr_file_format.format(ctx.author, number)
    sr_file.touch()
    fp = sr_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(sr_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(sr_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    await ctx.send("Recognizing your voice, please wait!")
    recognizer = speech_recognition.Recognizer()
    with speech_recognition.AudioFile(fp) as source:
        sr_audio_data = recognizer.record(source)
    # print(recognizer.recognize_google(sr_audio_data, language='en-US'))
    await ctx.send("I think this is right, maybe, \n Here's your Speech-To-Text \n > " + recognizer.recognize_google(sr_audio_data, language='en-US'))
    number += 1

Well, recognition is bad, I'll provide some kinds of Pictures, which shows the commands usage. And well, those are incomplete, let me provide you the some extra codes which makes all 3 commands work correctly without changing a single line of code.

number_txt_file = Path.cwd() / 'number.txt'
number_txt_file.touch(exist_ok=True)
number = int(number_txt_file.open('r').read() or 0)
waves_folder = (Path.cwd() / 'recordings')
waves_file_format = "recording{}.wav"
waves_folder.mkdir(parents=True, exist_ok=True)
tts_folder = (Path.cwd() / 'tts')
tts_folder.mkdir(parents=True, exist_ok=True)
tts_file_format = "tts{}{}.mp3"
sr_folder = (Path.cwd() / 'sr')
sr_folder.mkdir(parents=True, exist_ok=True)
sr_file_format = "sr{}{}.wav"

This is to make the dirs and files, to save the recordings, the recording will be deleted when there are 10+ recording files. To save some space and to abide the privacy of users.

@tasks.loop(seconds=4)
async def save_number_loop():
    global number
    with number_txt_file.open('w') as fp:
        fp.write(str(number))
    if len(list(waves_folder.iterdir())) > 10:
        print("Deleting recording files as the recording file's count got above 10.")
        for item in waves_folder.iterdir():
            # print(item)
            item.unlink()
        number = 0

This deletes the recordings and writes the number as well. To save the number if the bot is restarted.

I might write a Voice Chat bot, if I figured out how to check if the user has stopped speaking and the voice data he spoke.

Excuse me being dumb but this code actually works and can actually listen to users? GG.

sairam4123 commented 3 years ago

I would like to provide some examples I made using this API. A record command which records user's voice from API.

@bot.command()
async def record(ctx: commands.Context, time: FutureTime, me_only: bool):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    wave_file = waves_folder / waves_file_format.format(number)
    wave_file.touch()
    fp = wave_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(wave_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(wave_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    # print(discord.File(fp, filename='record.wav'))
    await ctx.send("Recording being sent. Please wait!")
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename=str(wave_file.name)))
    number += 1

The command is not that great, but still fine. I'll keep this updated as much as possible. Next one. Uses gTTS and pydub. A Text to Speech command.

@bot.command(aliases=['tts'])
async def text_to_speech(ctx: commands.Context, lang: Optional[str] = None, *, message: str):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    tts_file = tts_folder / tts_file_format.format(ctx.author, number)
    gtts.gTTS(message, lang=lang).save(str(tts_file))
    tts_file_wav = tts_file.with_suffix('.wav')
    pydub.AudioSegment.from_mp3(tts_file).export(tts_file_wav, format='wav')

    if not ctx.voice_client.is_playing():
        ctx.voice_client.play(discord.FFmpegPCMAudio(str(tts_file_wav)))
    number += 1

Not that great, I'm still, figuring out a better way to do this, any help welcome. Anyway, next one is Speech to Text, Uses SpeechRecognition.

@bot.command(aliases=['stt'])
async def speech_to_text(ctx: commands.Context, time: FutureTime, me_only: bool = True):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    sr_file = sr_folder / sr_file_format.format(ctx.author, number)
    sr_file.touch()
    fp = sr_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(sr_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(sr_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    await ctx.send("Recognizing your voice, please wait!")
    recognizer = speech_recognition.Recognizer()
    with speech_recognition.AudioFile(fp) as source:
        sr_audio_data = recognizer.record(source)
    # print(recognizer.recognize_google(sr_audio_data, language='en-US'))
    await ctx.send("I think this is right, maybe, \n Here's your Speech-To-Text \n > " + recognizer.recognize_google(sr_audio_data, language='en-US'))
    number += 1

Well, recognition is bad, I'll provide some kinds of Pictures, which shows the commands usage. And well, those are incomplete, let me provide you the some extra codes which makes all 3 commands work correctly without changing a single line of code.

number_txt_file = Path.cwd() / 'number.txt'
number_txt_file.touch(exist_ok=True)
number = int(number_txt_file.open('r').read() or 0)
waves_folder = (Path.cwd() / 'recordings')
waves_file_format = "recording{}.wav"
waves_folder.mkdir(parents=True, exist_ok=True)
tts_folder = (Path.cwd() / 'tts')
tts_folder.mkdir(parents=True, exist_ok=True)
tts_file_format = "tts{}{}.mp3"
sr_folder = (Path.cwd() / 'sr')
sr_folder.mkdir(parents=True, exist_ok=True)
sr_file_format = "sr{}{}.wav"

This is to make the dirs and files, to save the recordings, the recording will be deleted when there are 10+ recording files. To save some space and to abide the privacy of users.

@tasks.loop(seconds=4)
async def save_number_loop():
    global number
    with number_txt_file.open('w') as fp:
        fp.write(str(number))
    if len(list(waves_folder.iterdir())) > 10:
        print("Deleting recording files as the recording file's count got above 10.")
        for item in waves_folder.iterdir():
            # print(item)
            item.unlink()
        number = 0

This deletes the recordings and writes the number as well. To save the number if the bot is restarted. I might write a Voice Chat bot, if I figured out how to check if the user has stopped speaking and the voice data he spoke.

Excuse me being dumb but this code actually works and can actually listen to users? GG.

Yes, it works without issues and without any changes. If your code looks like this.

from pathlib import Path
from typing import Optional

import discord
import gtts
import pydub
import speech_recognition
from discord.ext import commands, tasks

from utils.time import FutureTime

discord.opus.load_opus(str(Path.cwd() / "waves\libopus-0.x64.dll"))
# print(discord.opus.is_loaded())
# print(Path.cwd() / 'waves')
import discord
import logging

logger = logging.getLogger('discord')
logger.setLevel(logging.DEBUG)
handler = logging.FileHandler(filename='discord.log', encoding='utf-8', mode='w')
handler.setFormatter(logging.Formatter('%(asctime)s:%(levelname)s:%(name)s: %(message)s'))
logger.addHandler(handler)
bot = commands.Bot('.')
number_txt_file = Path.cwd() / 'number.txt'
number_txt_file.touch(exist_ok=True)
number = int(number_txt_file.open('r').read() or 0)
waves_folder = (Path.cwd() / 'recordings')
waves_file_format = "recording{}.wav"
waves_folder.mkdir(parents=True, exist_ok=True)
tts_folder = (Path.cwd() / 'tts')
tts_folder.mkdir(parents=True, exist_ok=True)
tts_file_format = "tts{}{}.mp3"
sr_folder = (Path.cwd() / 'sr')
sr_folder.mkdir(parents=True, exist_ok=True)
sr_file_format = "sr{}{}.wav"

@bot.event
async def on_ready():
    print('Running bot')
    print(bot.user.id)
    print(bot.user)

async def ensure_voice(ctx):
    if not ctx.author.voice:
        # "Fist join a Voice Channel, you man!"
        await ctx.send("Fist join a Voice Channel, you man!")
        raise Exception

@bot.command()
@commands.before_invoke(ensure_voice)
async def record(ctx: commands.Context, time: FutureTime, me_only: bool):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    wave_file = waves_folder / waves_file_format.format(number)
    wave_file.touch()
    fp = wave_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(wave_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(wave_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    # print(discord.File(fp, filename='record.wav'))
    await ctx.send("Recording being sent. Please wait!")
    await ctx.send('Here\'s, your record file.', file=discord.File(fp, filename=str(wave_file.name)))
    number += 1

# @bot.event
# async def on_command_error(ctx, error):
#     if hasattr(error, 'original'):
#         error = error.original
#     if isinstance(error, NotImplementedError):
#         await ctx.send(error)

@bot.command()
async def test_send_music_api(ctx: commands.Context, wav_file):
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    if not ctx.voice_client.is_playing():
        ctx.voice_client.play(discord.FFmpegPCMAudio('waves/{}'.format(wav_file)))

@bot.command(aliases=['tts'])
async def text_to_speech(ctx: commands.Context, lang: Optional[str] = None, *, message: str):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    tts_file = tts_folder / tts_file_format.format(ctx.author, number)
    gtts.gTTS(message, lang=lang).save(str(tts_file))
    tts_file_wav = tts_file.with_suffix('.wav')
    pydub.AudioSegment.from_mp3(tts_file).export(tts_file_wav, format='wav')

    if not ctx.voice_client.is_playing():
        ctx.voice_client.play(discord.FFmpegPCMAudio(str(tts_file_wav)))
    number += 1

@bot.command(aliases=['stt'])
async def speech_to_text(ctx: commands.Context, time: FutureTime, me_only: bool = True):
    global number
    if not ctx.voice_client:
        await ctx.author.voice.channel.connect()
    sr_file = sr_folder / sr_file_format.format(ctx.author, number)
    sr_file.touch()
    fp = sr_file.open('rb')
    if me_only:
        ctx.voice_client.listen(discord.UserFilter(discord.WaveSink(str(sr_file)), ctx.author))
    else:
        ctx.voice_client.listen(discord.WaveSink(str(sr_file)))
    await discord.utils.sleep_until(time.dt)
    ctx.voice_client.stop_listening()
    await ctx.send("Recognizing your voice, please wait!")
    recognizer = speech_recognition.Recognizer()
    with speech_recognition.AudioFile(fp) as source:
        sr_audio_data = recognizer.record(source)
    # print(recognizer.recognize_google(sr_audio_data, language='en-US'))
    await ctx.send("I think this is right, maybe, \n Here's your Speech-To-Text \n > " + recognizer.recognize_google(sr_audio_data, language='en-US'))
    number += 1

@tasks.loop(seconds=4)
async def save_number_loop():
    global number
    with number_txt_file.open('w') as fp:
        fp.write(str(number))
    if len(list(waves_folder.iterdir())) > 10:
        print("Deleting recording files as the recording file's count got above 10.")
        for item in waves_folder.iterdir():
            # print(item)
            item.unlink()
        number = 0

save_number_loop.start()
bot.run('TOKEN_HERE')

Ceres445 commented 3 years ago

could you potentially add recording video + audio and storing it separated by user id for each user

imayhaveborkedit commented 3 years ago

Nothing involving video will ever happen due to complexity and discord literally blocking bots from it.

MaddyGuthridge commented 3 years ago

Ok so I'm kinda new to discord.py, but I can kinda almost sorta follow what's going on in the most recent example code. I'm currently trying to make a script that implements two bots in seperate voice channels. The first bot listens and plays the audio stream out through the second bot. Is there a way I can get this sort of continuous stream going into a buffer which is constantly being played out by the other bot? Where can I even find these functions (they aren't in the API docs) - do I need to switch to a different branch? Any ideas or thoughts are appreciated!

NormHarrison commented 3 years ago

@MiguelGuthridge That use case is very similar to what I ended up using this branch for (real time voice communication, except the endpoint is not another Discord bot). To achieve what you're wanting to, the first solution I can think of is to use network sockets and simply send the incoming raw PCM data to the other bot and vice-versa. I had to use sockets with queue objects for what I was trying to achieve and it seems to work quite well. Although since there is no provided "network source/sink", creating custom ones by subclassing the discord.AudioSource and discord.AudioSink classes is required, but again that is very easy and all you need to override are the __init__ and read/write method, here is an example:

class custom_network_sink(discord.AudioSink):
  def __init__(self, member):
      # Pass in the member object so audio that doesn't come from the command invoker is ignored
      self.invoker_id = member.id
      # You could setup your socket connection with the other bot here for example
      self.socket.connect(('127.0.0.1', 1234))  # This is skipping a few required socket steps for simplicity

  def write(voice_blob):
      # If the audio didn't come from the user who started the "call", ignore their audio
      if voice_blob.user.id != self.invoker_id:
        return

      # Sends the incoming raw PCM bytes to the other bot
      self.socket.sendall(voice_blob.data)

You would then create an instance of the above class and pass it into voice_client.listen() within the command that starts the call:

network_sink = custom_network_sink(ctx.author)
ctx.guild.voice_client.listen(network_sink)

In the other bot, you could then receive the audio being sent above in a custom discord.AuioSource class and send it to Discord, and two way communication could be done by creating another another socket. You could also probably use one socket to send and receive audio both ways by utilizing a queue object to get the audio data out of this thread, since voice sending/receiving occurs in separate threads and it otherwise wouldn't really be possible to obtain the data that they're dealing with and get it to other outside places.

MaddyGuthridge commented 3 years ago

@NormHarrison looks like I have a bit of learning to do.... Thanks for the pointers!

NormHarrison commented 3 years ago

Just in case anyone wants to continue to use this (as I do), or even wants to begin using it still, there's another small change that's needed in one of the internal library files in order to make it continue working with Discord's remote API.

Within the file voice_client.py on line 197 where you see the call to the replace() method:

self.endpoint = endpoint.replace(':80', '')

You'll need to change the port number it filters out to :443 instead of :80, and then it should work as it did previously.

The specific error you'll see being printed when the above problem is occurring, is:

self.endpoint_ip = socket.gethostbyname(self.endpoint)
socket.gaierror: [Errno -2] Name or service not known

As the port number is being left in the domain name that's passed to socket.gethostbyname()

Rapptz / discord.py

RFC: Voice Receive API Design/Usage #1094