Chainlit / chainlit

Build Conversational AI in minutes ⚡️
https://docs.chainlit.io
Apache License 2.0
6.73k stars 872 forks source link

Play multiple `Audio` elements in sequence #1045

Open Simon-Stone opened 3 months ago

Simon-Stone commented 3 months ago

Is your feature request related to a problem? Please describe. I am working with a voice chat model that responds to a single user message with multiple audio messages. I want to use auto_play so the interaction is more natural and does not require the user to click play. However, if a message comes in, it will play even if a previous message is still playing.

Describe the solution you'd like I would like a way to specify that an Audio element should play automatically but wait for other elements to finish playing first.

puppetm4st3r commented 2 months ago

workarround: add to custom.js file in public directory the following (ensure to config de custom.js in config.toml)

document.addEventListener('DOMContentLoaded', () => {
    const audioQueue = [];
    let isPlaying = false;

    // Function to play the next audio in the queue with a delay
    const playNextAudio = () => {
        if (audioQueue.length > 0) {
            const currentAudio = audioQueue.shift();
            isPlaying = true;

            setTimeout(() => {
                currentAudio.play();

                // Add an event listener to play the next audio when the current one ends
                currentAudio.addEventListener('ended', () => {
                    isPlaying = false;
                    playNextAudio();
                });
            }, 1000);  // Delay of 500ms between audio plays
        }
    };

    // Function to add a new audio element to the queue
    const addAudioToQueue = (audioElement) => {
        audioQueue.push(audioElement);
        if (!isPlaying) {
            playNextAudio();
        }
    };

    // Function to check if a node contains the specific <p> element
    const hasInputAudioParagraph = (node) => {
        const paragraph = node.querySelector('p.MuiTypography-root.MuiTypography-body1.css-1ihfmjp');
        if (paragraph) {
            const paragraphText = paragraph.textContent ? paragraph.textContent.trim() : "";
            return paragraphText.includes('input_audio');
        }
        return false;
    };

    // Function to check and add audio elements from a node
    const checkAndAddAudio = (node) => {
        if (node.nodeType === Node.ELEMENT_NODE) {
            if (node.classList.contains('inline-audio')) {
                const audioElement = node.querySelector('audio');
                const exclude = hasInputAudioParagraph(node);
                if (audioElement && !exclude) {
                    addAudioToQueue(audioElement);
                }
            } else {
                // Check child nodes if the current node is not an inline-audio container
                node.querySelectorAll('.inline-audio').forEach(inlineAudioNode => {
                    const audioElement = inlineAudioNode.querySelector('audio');
                    const exclude = hasInputAudioParagraph(inlineAudioNode);
                    if (audioElement && !exclude) {
                        addAudioToQueue(audioElement);
                    }
                });
            }
        }
    };

    // Observe for new audio containers added to the document
    const observer = new MutationObserver((mutationsList) => {
        for (const mutation of mutationsList) {
            if (mutation.type === 'childList') {
                for (const node of mutation.addedNodes) {
                    checkAndAddAudio(node);
                }
            }
        }
    });

    // Start observing the document for added child nodes
    observer.observe(document.body, { childList: true, subtree: true });
});

on the backend side, send the audios with autoplay = False, and the user input audios has to be named like this (or what you want but you have to change it on the js too)

@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.AudioChunk):
    if chunk.isStart:
        buffer = BytesIO()
        # This is required for whisper to recognize the file type
        buffer.name = f"input_audio.{chunk.mimeType.split('/')[1]}"
        # Initialize the session for a new audio stream
        cl.user_session.set("audio_buffer", buffer)
        cl.user_session.set("audio_mime_type", chunk.mimeType)

    # For now, write the chunks to a buffer and transcribe the whole audio at the end
    cl.user_session.get("audio_buffer").write(chunk.data) #type: ignore (BytesIO)

input_audio is what the js code will be look to distinct between assistand audios and user audios