Rapptz / discord.py

An API wrapper for Discord written in Python.
http://discordpy.rtfd.org/en/latest
MIT License
14.73k stars 3.74k forks source link

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Open Mercurial opened 7 years ago

Mercurial commented 7 years ago

Hi guys, I'm wondering if the library has capability to read audio bytes from the voice channels? I'm building a Bot that will read the voice and try to convert it to text commands.

Can anyone enlighten me?

Thanks!

Fuyukai commented 7 years ago

Not yet.

Voice receive has been planned for ages. PRs welcome.

Mercurial commented 7 years ago

https://github.com/Rapptz/discord.py/blob/master/discord/voice_client.py#L266

Seems to be already reading/polling from the voice channel though?

Fuyukai commented 7 years ago

Sure, just design the API, document it fully and submit a pull request.

ghost commented 7 years ago

Just to settle this, this has been tried and tried again, and everyone has mostly failed. Danny wants it one in a nice way, but really, it isn't worth the time and effort, so this will not be coming anytime soon.

Mercurial commented 7 years ago

whos danny? and why is it hard? isn't it just connecting whatever the socket for the audio and reading that data

ghost commented 7 years ago

Danny made this library (Danny = Rapptz). Next, it is very easy to read the data from the websocket, but presenting data that is usable, in a decent manner, is hard. In essence, you have to chunk streamed data, and I dont think anybody wants to go into the trouble of doing that, yet. The hard part is designing the API in a way that is useful and useable, not a quick throw-together solution.

ghost commented 7 years ago

Just to let you know, Danny has planned voice recieve for the rewrite

Mercurial commented 7 years ago

oh ok thanks for the information!

Ruuttu commented 7 years ago

I wanted my bot to play a 15 second replay on-demand, just for laughs basically, so I needed basic recording capability to start with.

I built a setup that mixes together all incoming audio and makes available a single stream of ~50 packets a second. There's no fancy synchronization or stretching, it's just in-out as fast as possible with a latency of a few frames so there's time to get everything in order. You need to call a function to fetch a new frame 50 times a second. Each speaker can be "re-synchronized" when they don't speak, so the stream remains live and stable on the long term even if there's minor drifting. Otherwise you could drop or duplicate packets I'm sure.

The code is shit, but if I could make it a little less shit, would that kind of basic "just feed me data" API be worthy if only as a starting point?

ghost commented 7 years ago

You can always pull request, but keep in mind, there have been more than a few failed attempts, since Danny is very strict when it comes to pull requests.

rawrzors commented 7 years ago

@Ruuttu Would you be able to share that code? Curious because I'm trying to add some voice recording (save to file)

Thanks

Ruuttu commented 7 years ago

Let's see. This was all done against version 0.13.0 at the time.

I started by copying the work from https://github.com/Rapptz/discord.py/pull/333 for receiving decrypted opus voice packets. I wrote a "Decoder" class in opus.py, which I've only confirmed to work in Windows.

In your bot (inherited from discord.Client) you need to call enable_voice_events() for your VoiceClient after joining a channel. After that you can receive opus packets in the on_speak() method which you'll add.

I wrote a "Recorder" class that takes the packets from on_speak(), converts them to PCM and maintains continuous per-speaker audio "streams" that sync together. There's a get_replay() method for retrieving the last n seconds of audio. You get lists of tuples because the audio is still separated by speaker, plus there's some extra data. Once you figure out what's what, you can mix together the speakers using python's audioop module.

You'll need to make some edits, but this should have all you need. I added a commented out example of how you might write a mixed down PCM stream to a file. Sorry some of the code is kinda silly and poorly commented. recording_example.zip

lasa01 commented 7 years ago

@Ruuttu Thanks for this! This is very helpful. However, doing these modifications against the latest discord.py version, the decoder doesn't seem to be working, it raises an access violation error.

Ignoring exception in on_speak
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\discorde\client.py", line 307, in _run_event
    yield from getattr(self, event)(*args, **kwargs)
  File "*******************************************dbot.py", line 60, in on_speak
    await self.servermgrs[server.id].on_speak(data, ssrc, timestamp, sequence)
  File "*******************************************servermgr.py", line 86, in on_speak
    self.vrecorder.receive_packet(data, ssrc, sequence, timestamp)
  File "*******************************************recorder.py", line 124, in receive_packet
    self.streams[ user_id ].append( data, sequence, timestamp )
  File "*******************************************recorder.py", line 30, in append
    pcm = self.decoder.decode( data, self.decoder.samples_per_frame )
  File "C:\Program Files\Python36\lib\site-packages\discorde\opus.py", line 356, in decode
    ret = _lib.opus_decode(self._state, data, max_data_bytes, pcm_pointer, frame_size, 0)
OSError: exception: access violation reading 0x0000000017607EE8

Only thing that has been changed between these versions (of discord.py) in opus.py is it setting the signal type to auto when encoding:

CTL_SET_SIGNAL       = 4024

signal_ctl = {
    'auto': -1000,
    'voice': 3001,
    'music': 3002,
}

class Encoder():
        __init__(self):
                self.set_signal_type('auto')

        def set_signal_type(self, req):
                if req not in signal_ctl:
                    raise KeyError('%r is not a valid signal setting. Try one of: %s' % (req, ','.join(signal_ctl)))

                k = signal_ctl[req]
                ret = _lib.opus_encoder_ctl(self._state, CTL_SET_SIGNAL, k)

                if ret < 0:
                    log.info('error has happened in set_signal_type')
                    raise OpusError(ret)

(in opus.py)

I just recently started with Python so I don't have any idea how this could be fixed. I already got decoding working before using python-opus(with some editing), but it would be nice to get this working since it doesn't need another library.

EDIT: I think i got it working, atleast it doesn't error anymore. I was just messing around in opus.py and somehow got it working. Here is my opus.py that seems to be working.

Bottersnike commented 7 years ago

I've been needing voice recieve for some stuff, and I've had a poke around and I think it should be possible to knock together a jitter buffer to handle recieving audio when I get home.

lasa01 commented 7 years ago

@Bottersnike Ruuttu's initial code seems to no longer work, it fails decrypting the voice packets with some ciphertext error. If you get your code working, could you share atleast the voice packet decrypting part? Thanks!

Bottersnike commented 7 years ago

I implemented it in node the other day because that was the only language I could find a good lib for receiving. It shouldn't be too hard to port it over and then make it conform to d.py.

mturley commented 6 years ago

Did you guys ever end up figuring out a reliable solution for audio receive? I would be happy to use someone's fork in the meantime if it's not good enough to be merged upstream.

My use case: I want to set up a Raspberry Pi running discord.py that will operate as a passthrough audio device to both transmit to and receive from a discord channel using the microphone and headphone jack of a USB audio adapter connected to the Pi. Then I plan to connect the mic jack to a feed coming from my Playstation 4, and the headphone jack to a line in adapter for the PS4... connect the PS4 to Party Chat and leave both it and the Pi running, and suddenly I have an official PSN Party that will allow PS4 players to chat with Discord users (who are playing the same cross-platform MMO on PC). It's for my Final Fantasy XIV group... But I imagine the 2-way Discord audio on the Pi might be useful for others too.

mturley commented 6 years ago

Looks like I might have better luck using https://discord.js.org instead.

Bottersnike commented 6 years ago

Indeed. The packet parsing that I was using was relying on the fact that Discord was not using the most up-to-date structure. Because of that, the entire RFC wasn't implemented. Due to my lack of motivation, I'm unlikely to ever fix it.

ghost commented 5 years ago

Sorry if I'm not up to date on this, has there been any work on this ? I'm interested in this feature for a voice recognition attempt I'm working on.

Harmon758 commented 5 years ago

See #1094

ghost commented 5 years ago

See #1094

Thanks for this :)

Jourdelune commented 1 year ago

This feature is useful, for example transcribe audio from channel and translate it in real time.