discord-net / Discord.Net

An unofficial .Net wrapper for the Discord API (https://discord.com/)
https://discordnet.dev
MIT License
3.28k stars 743 forks source link

Audio delay when sending audio with FFmpeg the second time onward #2364

Open neomoth opened 2 years ago

neomoth commented 2 years ago

Hello, I'm currently making a discord text-to-speech bot that uses Streamlabs API to set different voices. I'm already at a point where it will grab the speak url from Streamlabs and feed it into FFmpeg, which will play through the bot, which works. However, there are two major issues. The first issue is that it won't actually transmit any really short 1-2 word messages through the bot. No green circle or anything indicating that it's speaking. The second issue, this one being much worse than the first, is that the second time using tts onward during that connection to the voice channel, there will be a delay on the TTS voice actually coming through, which seems to last as long as the first message was. The green circle around the bot's profile picture will appear while it's silent, before playing the latter part of the audio provided it's longer than the one played before it. This continues until the bot is reconnected to the voice channel.

Here is the relevant audio related code that is responsible for connecting to a voice channel, leaving one, and for playing TTS audio:

/* Connect/Disconnect commands */
private readonly AudioService _audioService;
public AudioModule(AudioService audioService) => _audioService = audioService;

[Command("connect", RunMode = RunMode.Async)]
[Summary("Connect bot to voice channel")]
[Alias("c")]
public async Task ConnectAsync([Remainder] [Summary("<voice_channel>")] IVoiceChannel? channel = null)
{
    channel = channel ?? (Context.User as IGuildUser)?.VoiceChannel;
    if (channel is null)
    {
        await ReplyAsync("You must be in a voice channel, or a channel must be passed as an argument");
        return;
    }

    IAudioClient? audioClient = null;
    try
    {
        await _audioService.JoinAudio(Context.Guild, channel, Context);
    }
    catch
    {
        await ReplyAsync("Unable to connect to channel.");
    }

    if (audioClient is not null)
    {
        if (audioClient.ConnectionState == ConnectionState.Connecting)
            await ReplyAsync("Attempting to connect...");
        if (audioClient.ConnectionState == ConnectionState.Connected)
        {
            await ReplyAsync("Connected to channel.");
        }
    }
}

[Command("disconnect")]
[Summary("Disconnects bot from whatever voice channel it's in.")]
[Alias("dc")]
public async Task DisconnectAsync()
{
    await _audioService.LeaveAudio(Context.Guild, Context);
}
/* Retrieve Streamlabs Speak URL */
public static async Task<HttpContent?> ApiRequest(SocketCommandContext context, string text)
{
    var client = new HttpClient();
    var values = new Dictionary<string, string>
    {
        { "voice", GetUserVoice(context) ?? throw new InvalidOperationException() },
        { "text", text }
    };
    var content = new FormUrlEncodedContent(values);
    try
    {
        var response = await client.PostAsync("https://streamlabs.com/polly/speak", content);
        response.EnsureSuccessStatusCode();
        return response.Content;
    }catch(HttpRequestException e)
    {
        await context.Channel.SendMessageAsync(e.ToString());
        return null;
    }
}
/* Connect/Disconnect methods in AudioService */
public async Task JoinAudio(IGuild guild, IVoiceChannel target, SocketCommandContext context)
{
    if (_connectedChannels.TryGetValue(guild.Id, out var client))
    {
        return;
    }

    if (target.Guild.Id != guild.Id)
    {
        return;
    }

    var audioClient = await target.ConnectAsync();
    await context.Channel.SendMessageAsync($"Joined {target}");
    if (_connectedChannels.TryAdd(guild.Id, audioClient))
    {
        Console.WriteLine($"Connected to channel in {guild.Name}");
    }
}

public async Task LeaveAudio(IGuild guild, SocketCommandContext context)
{
    if (_connectedChannels.TryRemove(guild.Id, out var client))
    {
        if (client != null) await client.StopAsync();
        await context.Channel.SendMessageAsync($"Disconnected from channel.");
        Console.WriteLine($"Disconnected from channel in {guild.Name}");
    }
    else
    {
        await context.Channel.SendMessageAsync("I'm not in a voice channel!");
    }
}
/* FFmpeg and audio transmission code */
public async Task SendAudioAsync(IGuild guild, IMessageChannel channel, string path)
{
    if (!path.Contains("https://polly.streamlabs.com"))
    {
        await channel.SendMessageAsync("Failed to retrieve tts link.");
        return;
    }

    if (_connectedChannels.TryGetValue(guild.Id, out var client))
    {
        using var ffmpeg = CreateProcess(path);
        if (client != null)
        {
            await using var stream = client.CreatePCMStream(AudioApplication.Music);
            try
            {
                await ffmpeg.StandardOutput.BaseStream.CopyToAsync(stream);
            }
            finally
            {
                await stream.FlushAsync();
            }
        }
    }
}

private Process CreateProcess(string path)
{
    try
    {
        return Process.Start(new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = $"-hide_banner -loglevel panic -i \"{path}\" -ac 2 -f s16le -ar 48000 pipe:1",
            UseShellExecute = false,
            RedirectStandardOutput = true
        }) ?? throw new InvalidOperationException();
    }
    catch(Exception e)
    {
        Console.WriteLine(e);
        throw new Exception();
    }
}

I'd love if I could get help resolving this issue, as its making the TTS function of the bot basically unusable after the first use in a voice channel.

csmir commented 2 years ago

Possibly related to #2256