CraigChat / craig

Craig is a multi-track voice recorder for Discord.
https://craig.chat
ISC License
400 stars 73 forks source link

Live streaming support (or, at least, something like it) #179

Open lachjames opened 1 year ago

lachjames commented 1 year ago

Hi :) I'm working on some integrations with Discord for a D&D 5e game I'm running, and I think Craig is really great for what I need (streaming audio from each user separately). I really need the information in real-time, which Craig sort of supports, but the issue is more on the server side than client side due to the implementation I've come up with:

  1. Poll at intervals (e.g. every 30 seconds), pulling the latest audio from Craig
  2. Transcribe the audio using OpenAI Whisper, for each speaker
  3. Collate the audio files (with timestamps) into a transcript, and push it to a Discord webhook

The issue, of course, is that I currently have no choice but to pull all the audio at every iteration, including audio I've already processed. I can alleviate this somewhat by making a simple Discord bot that disconnects and reconnects Craig every x minutes (e.g. every 30 minutes), but it's still an unnecessary drain. I've not looked too deeply into the code, as I figured (correctly, it turns out) it would be simpler to just copy the API calls from the website, so that's my perspective here; with that said, I think a reasonable solution would be: when making the POST request to https://craig.horse/api/recording/{id}/cook?key={key}, support an additional argument: startTime (and endTime, why not?). Then just pull audio which was created within that time period (or before/after that time, if only one of the fields is given). I think this would be a relatively straightforward solution to implement on the server-side, since it's just a case of pruning the audio files at some point before sending them (ideally as early as possible, to avoid

A more rigorous (but complex) option would be to support streaming more directly - that'd be really cool, but (I imagine) a lot more work too.

One other option I'm looking into is creating a simple version of Craig which only captures the audio, which I can then run on my own computer. This would bypass all the issues with transfer and compute, but if this is something other people might want to be able to do with Craig, maybe it's worth building it as a feature?

lachjames commented 1 year ago

I looked a bit further into this and I think it's a relatively straightforward change to implement, with a couple of edits:

cook.sh [Line 43]

CONTAINER=zip
[ "$1" ] && CONTAINER="$1"
shift

STARTTIME="00:00:00"
["$1"] && STARTTIME="$1"
shift

cook.sh [Line 66-ish]

case "$FORMAT" in
    stream)
        ext=wav
        ENCODE="ffmpeg -f wav -i - -c:a adpcm_ms -f wav -ss $STARTTIME -"
        CONTAINER=zip
        ZIPFLAGS=-9
        ;;

craig/apps/download/api/src/util/cook.ts [Line 134]

export async function cook(id: string, format = 'flac', container = 'zip', dynaudnorm = false, startTime = "00:00:00") {
  const [state, writeState, deleteState] = stateManager(id);

  try {
    await writeState({ message: 'Starting...' });
    const cookingPath = path.join(cookPath, '..', 'cook.sh');
    const args = [id, format, container, ...(dynaudnorm ? ['dynaudnorm'] : [], startTime)];
    const child = spawn(cookingPath, args, { detached: true });
    console.log(`Cooking ${id} (${format}.${container}${dynaudnorm ? ' dynaudnorm' : ''}, from {startTime}) with process ${child.pid}`);
    registerProcess(child, deleteState);

    // Prevent the stream from ending prematurely (for some reason)
    child.stderr.on('data', getStderrReader(state, writeState));

    return child.stdout;
  } catch (e) {
    deleteState();
    throw e;
  }
}

I'm not on my dev machine today so I don't have Docker at hand, otherwise I'd give it a test run now - I'll give it a shot later and see if it breaks (or, more likely, what breaks...)

Snazzah commented 1 year ago

Don't know about altering how cooking works to do live transcription, but if there's a good way to do that within the bot with whisper that may be a good feature to have