Discord.js streams do not interpolate silence

SOLR4189 commented 1 year ago

Which package is this bug report for?

discord.js

Issue description

I made a discord bot with Discord.js v14 that records users' audio as individual files. But in these files it sounds like there are no silence gaps in users' speech. The audio sounds like one continuous sentence, and it makes it impossible for people (and STT algorithms) to divide it into comprehensible sentences.

Code sample

voiceConnector.ts:

export async function record(client: Client, channel: VoiceBasedChannel ) {
    const connection = getVoiceConnection(channel.guild.id);

    if (connection) {
        const receiver = connection.receiver;

        receiver.speaking.on('start', (userId) => {
            log.debug(`User ${userId} started speaking`);
            createListeningStream(receiver, channel, userId, client.users.cache.get(userId));
        });

        log.debug('Listening!');
    } else {
        throw new Error(`Failed to start listening to the ${channel.guild.name}[${channel.name}] channel!`);
    }
}

streamRecorder.ts:

import { createWriteStream, unlink, copyFileSync } from 'node:fs';
import { pipeline } from 'node:stream';
import { join } from 'node:path';
import { EndBehaviorType, VoiceReceiver } from '@discordjs/voice';
import type { User, VoiceBasedChannel } from 'discord.js';
import * as prism from 'prism-media';
import * as log4js from "log4js";

const activeRecordings: { [id: string] : string; } = {};

function getDisplayName(userId: string, user?: User) {
    return user ? `${user.username}` : userId;
}

export function createListeningStream(receiver: VoiceReceiver, channel: VoiceBasedChannel, userId: string, user?: User) {
    const id = `${channel.guild.id}-${channel.id}-${userId}`;

    if (activeRecordings[id]) {
        log.debug(`Already recording for user ${userId}`);

        return;
    } else {
        activeRecordings[id] = userId;
    }

    const opusStream = receiver.subscribe(userId, {
        end: {
            behavior: EndBehaviorType.AfterSilence,
            duration: 5000,
        },
    });

    const oggStream = new prism.opus.OggLogicalBitstream({
        opusHead: new prism.opus.OpusHead({
            channelCount: 2,
            sampleRate: 48000,      
        }),
        pageSizeControl: {
            maxPackets: 10,
        },
    });

    const startTime = Date.now();
    const username = getDisplayName(userId, user);
    const deviceId = guildIdToDeviceId[channel.guild.id] as string;
    const oggFileName = `${join(CONSTS.RECORDINGS_IN_PROCESS_DATA_PATH, deviceId, `${startTime}-${username}`)}${CONSTS.OGG_EXTENSION}`;

    const dataWriteStream = createWriteStream(oggFileName);

    log.debug(`Started recording ${oggFileName}`);

    pipeline(opusStream, oggStream, dataWriteStream, async (error) => {
        delete activeRecordings[id];

        if (error) {
            log.error(`Error recording file ${oggFileName}`, error);

            return;
        }

        try {
            // Split to one minute recordins and encrypt
            await processRecording(channel, userId, startTime, username, oggFileName);
        } catch (error) {
            log.error(`Error processing recording file ${oggFileName}`, error);
        }
    });
}

Versions

"@discordjs/opus": "^0.8.0"
"@discordjs/voice": "^0.16.0"
"discord.js": "^14.11.0"

Issue priority

Medium (should be fixed soon)

Which partials do you have configured?

Not applicable

Which gateway intents are you subscribing to?

Guilds, GuildVoiceStates

I have tested this issue on a development release

No response

Qjuh commented 1 year ago

You don't listen to when they speaking stops. Do that to split the files into the chunks you want. currently your stream will just not receive data while they don't speak and then the next data chunk comes when they continue speaking and pipe right after those before.

SOLR4189 commented 1 year ago

@Qjuh Could you explain in more details? Do you suggest to subscribe on receiver.speaking.on('stop')? Then what am I supposed to do, append SILENCE buffer until a user starts speaking again or until the EndBehaviorType.AfterSilence condition is met?

Qjuh commented 1 year ago

I suggest joining the Discord server for such inquiries, since what you‘re describing is not a bug.

discordjs / discord.js