discordjs / discord.js

A powerful JavaScript library for interacting with the Discord API
https://discord.js.org
Apache License 2.0
25.5k stars 3.97k forks source link

Discord.js streams do not interpolate silence #9992

Closed SOLR4189 closed 1 year ago

SOLR4189 commented 1 year ago

Which package is this bug report for?

discord.js

Issue description

I made a discord bot with Discord.js v14 that records users' audio as individual files. But in these files it sounds like there are no silence gaps in users' speech. The audio sounds like one continuous sentence, and it makes it impossible for people (and STT algorithms) to divide it into comprehensible sentences.

Code sample

voiceConnector.ts:

export async function record(client: Client, channel: VoiceBasedChannel ) {
    const connection = getVoiceConnection(channel.guild.id);

    if (connection) {
        const receiver = connection.receiver;

        receiver.speaking.on('start', (userId) => {
            log.debug(`User ${userId} started speaking`);
            createListeningStream(receiver, channel, userId, client.users.cache.get(userId));
        });

        log.debug('Listening!');
    } else {
        throw new Error(`Failed to start listening to the ${channel.guild.name}[${channel.name}] channel!`);
    }
}

streamRecorder.ts:

import { createWriteStream, unlink, copyFileSync } from 'node:fs';
import { pipeline } from 'node:stream';
import { join } from 'node:path';
import { EndBehaviorType, VoiceReceiver } from '@discordjs/voice';
import type { User, VoiceBasedChannel } from 'discord.js';
import * as prism from 'prism-media';
import * as log4js from "log4js";

const activeRecordings: { [id: string] : string; } = {};

function getDisplayName(userId: string, user?: User) {
    return user ? `${user.username}` : userId;
}

export function createListeningStream(receiver: VoiceReceiver, channel: VoiceBasedChannel, userId: string, user?: User) {
    const id = `${channel.guild.id}-${channel.id}-${userId}`;

    if (activeRecordings[id]) {
        log.debug(`Already recording for user ${userId}`);

        return;
    } else {
        activeRecordings[id] = userId;
    }

    const opusStream = receiver.subscribe(userId, {
        end: {
            behavior: EndBehaviorType.AfterSilence,
            duration: 5000,
        },
    });

    const oggStream = new prism.opus.OggLogicalBitstream({
        opusHead: new prism.opus.OpusHead({
            channelCount: 2,
            sampleRate: 48000,      
        }),
        pageSizeControl: {
            maxPackets: 10,
        },
    });

    const startTime = Date.now();
    const username = getDisplayName(userId, user);
    const deviceId = guildIdToDeviceId[channel.guild.id] as string;
    const oggFileName = `${join(CONSTS.RECORDINGS_IN_PROCESS_DATA_PATH, deviceId, `${startTime}-${username}`)}${CONSTS.OGG_EXTENSION}`;

    const dataWriteStream = createWriteStream(oggFileName);

    log.debug(`Started recording ${oggFileName}`);

    pipeline(opusStream, oggStream, dataWriteStream, async (error) => {
        delete activeRecordings[id];

        if (error) {
            log.error(`Error recording file ${oggFileName}`, error);

            return;
        }

        try {
            // Split to one minute recordins and encrypt
            await processRecording(channel, userId, startTime, username, oggFileName);
        } catch (error) {
            log.error(`Error processing recording file ${oggFileName}`, error);
        }
    });
}

Versions

Issue priority

Medium (should be fixed soon)

Which partials do you have configured?

Not applicable

Which gateway intents are you subscribing to?

Guilds, GuildVoiceStates

I have tested this issue on a development release

No response

Qjuh commented 1 year ago

You don't listen to when they speaking stops. Do that to split the files into the chunks you want. currently your stream will just not receive data while they don't speak and then the next data chunk comes when they continue speaking and pipe right after those before.

SOLR4189 commented 1 year ago

@Qjuh Could you explain in more details? Do you suggest to subscribe on receiver.speaking.on('stop')? Then what am I supposed to do, append SILENCE buffer until a user starts speaking again or until the EndBehaviorType.AfterSilence condition is met?

Qjuh commented 1 year ago

I suggest joining the Discord server for such inquiries, since what you‘re describing is not a bug.