Skylar-Tech / node-red-contrib-matrix-chat

Matrix chat server support for Node-RED
GNU General Public License v3.0
31 stars 10 forks source link

Allow to receive audio/voice messages #61

Closed mikedolx closed 2 years ago

mikedolx commented 2 years ago

Hi,

I would like to receive audio messages from my chats to process them further. I have tried all ticks in the "matrix receive" flow, but i wasn't able to receive audio messages. Is that feature supported? If so, how can i enable receiving audio messages? If not, is it hard to implement such a feature?

Thanks and BR

skylord123 commented 2 years ago

Ah yeah the voice messages are sent with type m.audio so they currently cannot be captured.

I'll queue this to be added to the next update.

Thanks!

skylord123 commented 2 years ago

@mikedolx I created a little flow with a function node that will output m.audio message types until we officially add it.

Make sure your client in Node-RED has "Global Access To Matrix Client" checked and import this JSON:

[{"id":"703ac45acd9d36ac","type":"function","z":"0dad84e70d3c608b","name":"Receive m.audio","func":"","outputs":1,"noerr":0,"initialize":"// Code added here will be run once\n// whenever the node is started.\nlet matrixClient = global.get(\"matrixClient['@bot:skylar.tech']\"),\n    matrixOnline = global.get(\"matrixClientOnline['@bot:skylar.tech']\");\n\nlet initializedAt = new Date();\nmatrixClient.on('Room.timeline', async function(event, room, toStartOfTimeline, removed, data) {\n                if (toStartOfTimeline) {\n                    return; // ignore paginated results\n                }\n                if (!event.getSender() || event.getSender() === node.userId) {\n                    return; // ignore our own messages\n                }\n                if (!data || !data.liveEvent) {\n                    return; // ignore old message (we only want live events)\n                }\n                if(initializedAt > event.getDate()) {\n                    return; // skip events that occurred before our client initialized\n                }\n\n                try {\n                    await matrixClient.decryptEventIfNeeded(event);\n                } catch (error) {\n                    node.error(error);\n                    return;\n                }\n\n                const isDmRoom = (room) => {\n                    // Find out if this is a direct message room.\n                    let isDM = !!room.getDMInviter();\n                    const allMembers = room.currentState.getMembers();\n                    if (!isDM && allMembers.length <= 2) {\n                        // if not a DM, but there are 2 users only\n                        // double check DM (needed because getDMInviter works only if you were invited, not if you invite)\n                        // hence why we check for each member\n                        if (allMembers.some((m) => m.getDMInviter())) {\n                            return true;\n                        }\n                    }\n                    return allMembers.length <= 2 && isDM;\n                };\n\n                let msg = {\n                    encrypted : event.isEncrypted(),\n                    redacted  : event.isRedacted(),\n                    content   : event.getContent(),\n                    type      : (event.getContent()['msgtype'] || event.getType()) || null,\n                    payload   : (event.getContent()['body'] || event.getContent()) || null,\n                    isDM      : isDmRoom(room),\n                    userId    : event.getSender(),\n                    topic     : event.getRoomId(),\n                    eventId   : event.getId(),\n                    event     : event\n                };\n                \n                // only look for m.audio\n                if(msg.type !== 'm.audio' || !msg.content.url) {\n                    return;\n                }\n                \n                msg.url = matrixClient.mxcUrlToHttp(msg.content.url);\n    \n                node.send(msg);\n});","finalize":"","libs":[],"x":630,"y":1580,"wires":[["a872b7375546776e"]]},{"id":"a872b7375546776e","type":"debug","z":"0dad84e70d3c608b","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":790,"y":1580,"wires":[]}]

After importing edit the function node and press "On Start" tab. Change the first two lines of code to match your user ID. Once deployed it will output any m.audio messages that come in. You will need to do the filtering to make sure it comes in on the right room (can edit the function node or just add a switch after the node).

The msg.url output will contain the HTTP url to download the audio file.

This should get you going until the next update is released.

mikedolx commented 2 years ago

Hi @skylord123,

thanks a lot for the response and your function. Unfortunately i still don't receive any message from the funciton block. Do i need to set a trigger for my flow, or does it work without a trigger?

I have setup a minimalistic flow like that:

image

Debug1 returns an object, when texting something in the room. But when i record a voice message, nothing comes to the Debug2 node.

I have placed your code into the Start-Tab and updated the mxid to the one i had setup for the "regular" matrix nodes.

skylord123 commented 2 years ago

@mikedolx It doesn't need an input. The code on the inside of the function runs when node-red starts or is deployed and creates a listener that will fire whenever an audio file comes in to a room the client is in.

You made sure that the "Global access to matrix client" option is checked in Node-RED under the configuration (didn't see you mention it)? Looks like this: image

Also, the mxid in the function node is case sensitive so make sure it matches any case you put in the mxid under the configuration node.

If both those don't work let me know and we can debug this further.

mikedolx commented 2 years ago

Hi @skylord123 ,

here is my matrix configuration for the node: grafik

this is how my receive audio function node looks. Not sure if i need the matrixOnlineobject, as it is not used anywhere. I commented it out, but it didn't change anything.

grafik

My test-room is configured to have the nodered-bot user (nevermind the other bots ☺).

grafik

Furthermore i use encryption, but i wouldn't say that this is an issue, as i was successfully able to receive events and messages using the "regular" receive node.

Any other ideas? Can i somehow debug the function itself? Like with some kind of console.output() lines, just to see if the code is stuck somewhere.

mikedolx commented 2 years ago

hey, i just peeked into the docker logs and could see, that at least something is comming through:

25 Mar 09:23:36 - [info] [matrix-server-config:nodered] Received encrypted timeline event [m.audio]: (Testraum) @michael:mydomain.de :: Voice message

mikedolx commented 2 years ago

So, after I discovered node.warn() (there are probably also other methods like info etc.) I did some further debugging. I noticed, that the code always returns in the if-statement where you check for the msg.content.url. As it seems, the download url is within a different property namely msg.content.file.url. See screenshot below:

image

After I had changed the if-statement to check for 'msg.content.file.url' I was able to get a debug output in my flow.

if(msg.type !== 'm.audio' || !msg.content.file.url) {
    // node.warn("I received something else: " + msg.type + ": " + msg.payload);
    // node.warn("The msg.content.url is " + msg.content.url);
    // node.warn("The msg.content is " + JSON.stringify(msg));
    return;
}

Now the debug output looks like this:

image

I guess, the interesting part for me is hidden in 'event.decrypted.content["org.matrix.msc1767.audio"].waveform'. Now, I have to see how I get an ogg from that array, but that's a different topic ☺.

Thanks for helping me out. I'd like to leave this ticket open and close it as soon as you have updated your node.

mikedolx commented 2 years ago

Here's my updated code that contains some debugging for your reference:


let matrixClient = global.get("matrixClient['@nodered:mydomain.de']");

node.warn("Hello There!");
let initializedAt = new Date();
matrixClient.on('Room.timeline', async function(event, room, toStartOfTimeline, removed, data) {

                if (toStartOfTimeline) {
                    return; // ignore paginated results
                }
                if (!event.getSender() || event.getSender() === node.userId) {
                    return; // ignore our own messages
                }
                if (!data || !data.liveEvent) {
                    return; // ignore old message (we only want live events)
                }
                if(initializedAt > event.getDate()) {
                    return; // skip events that occurred before our client initialized
                }

                try {
                    await matrixClient.decryptEventIfNeeded(event);
                } catch (error) {
                    node.error(error);
                    return;
                }

                const isDmRoom = (room) => {
                    node.warn("I'm in a DM room");
                    // Find out if this is a direct message room.
                    let isDM = !!room.getDMInviter();
                    const allMembers = room.currentState.getMembers();
                    if (!isDM && allMembers.length <= 2) {
                        // if not a DM, but there are 2 users only
                        // double check DM (needed because getDMInviter works only if you were invited, not if you invite)
                        // hence why we check for each member
                        if (allMembers.some((m) => m.getDMInviter())) {
                            return true;
                        }
                    }
                    return allMembers.length <= 2 && isDM;
                };

                let msg = {
                    encrypted : event.isEncrypted(),
                    redacted  : event.isRedacted(),
                    content   : event.getContent(),
                    type      : (event.getContent()['msgtype'] || event.getType()) || null,
                    payload   : (event.getContent()['body'] || event.getContent()) || null,
                    isDM      : isDmRoom(room),
                    userId    : event.getSender(),
                    topic     : event.getRoomId(),
                    eventId   : event.getId(),
                    event     : event
                };

                // only look for m.audio
                if(msg.type !== 'm.audio' || !msg.content.file.url) {
                    return;
                }

                msg.url = matrixClient.mxcUrlToHttp(msg.content.url);

                node.send(msg);
});
skylord123 commented 2 years ago

@mikedolx glad you got it working!

I completely forgot that encrypted messages store the URL field under another part of the content. The code I showed you only works in non-encrypted rooms.

Also, you can use console.log within NR nodes, it just will output to the NR console. That can be useful for bigger objects you want to dump.

Thanks for posting your updated code!

mikedolx commented 2 years ago

Hi,

i did some testing the last days and stumbled upon some issues. I wanted to recode the received voice message to wav, for further processing. Unfortunately, all my attempts failed with the given node-red blocks (ffmpeg, sox).

In your code snipped i see, that you are setting the msg.url parameter to a url, that allows me to download the voice message. But, when i try to pass the downloaded attachment as buffer to a recode block, this block complains, that it is not formatted correctly.

See also here for the long story 😀.

After downloading the voice message with my browser and trying to open it with any medial player i can confirm, that it's not a valid voice messag (ogg/opus formatted byte stream). This article states, that every ogg file/container starts with the magic byte OggS. This is the case, when i download the voice message via my matrix client. I can see the OggS magic byte at the beginnning.

I suspect, that the downloaded "attachment" is still encrypted. At least it would make sense to me, as the url that is returned by your snippet does not require any further authorization.

If so, what would be the best way do decrypt the attachment? I guess, there is any build in method to decrypt the downloaded file? I browsed the matrix-js-sdk docs and saw some candidates, that could do the job. Unfortunately i lack some experience in this sdk. How do you handle images, or other attachments?

Thanks and BR;

mikedolx commented 2 years ago

Just wanted to let you know, I figured out, that there is a "decrypt file" block 😓🙈. After connecting the block, I was successfully able to get the ogg wave stream and pass it to the ffmpeg recoder 👍.

skylord123 commented 2 years ago

@mikedolx Just dropped official support for audio files in 0.6.1. Let me know how that works out. I did quite a bit of testing and it was working well on my side.

Glad you got it working until this release came out. :)

mikedolx commented 2 years ago

@skylord123 awesome! Will try it out!

Friedjof commented 6 months ago

Hello @mikedolx

I'm working on a similar project involving receiving and processing audio messages from Matrix chat in Node-RED. My current challenge is converting these audio data into a proper WAV format for speech-to-text transcription. I noticed that you have successfully managed decryption and processing of such data in Node-RED. Could you possibly elaborate on how you achieved the conversion into a WAV file? Any guidance or directions would be greatly appreciated.

Thank you in advance for your assistance!

skylord123 commented 6 months ago

@Friedjof ffmpeg is probably going to be the best tool for the job. Once installed you can call it from Node-RED (using exec node) for use in your flows. If you run Node-RED in a container you will have to install ffmpeg in the container as well (although volume mapping the binary from host may work).

The next release will have a new File Upload node that will handle detecting metadata for images, videos, and audio files automatically as long as ffprobe is installed (which comes with ffmpeg). It also uses ffmpeg for generating video thumbnails (just like the Element client does when you send a video).

Mind me asking what sort of audio files you are processing? Are you trying to transcribe meetings? Reason I ask is there was a point I thought about using it to log ideas to like an old school tape recorder. Post the short audio clips to a matrix room and have Node-RED transcribe the audio. Curious if that is what you are doing or something else.

Friedjof commented 6 months ago

I'm developing an assistant to control various aspects of my homelab. The goal is to convert voice messages into text, which are then either directly recognized as commands or further processed by ChatGPT through the OpenAI API. From what I've observed, the voice messages are in WAV format. Thank you for the ffmpeg suggestion – I plan to test this approach in the coming days and will report back on its effectiveness.

mikedolx commented 6 months ago

Hi @skylord123 and @Friedjof,

i tried to implement a transcoder, that converts an ogg audio message coming from a matrix room into a wav file (which is required by most of the transcription services). I somehow couldn't figure it out how to transcode the ogg with several different blocks (can't remember which of those i used). FFMPEG was also one part of it. At the end i used a web service from Microsoft Azure, that accepted the ogg file as is and returned the transcription in a json structure. Currently, i'm thinking about implementing an STT via wyoming protocol, which was developed by home assistant. There is a self hostable whisper service (fast whisper) that you could issue. But still i'd need some block to transcode the ogg to wave, because i guess, that wyoming still acceppts only wav as input.

Friedjof commented 6 months ago

@mikedolx Thank you for your answer. I haven't heard of it yet. It's not a top priority at the moment, but I'll have a look at it in the spring.