Joooohan / audio-recorder-streamlit

MIT License
74 stars 15 forks source link

There appears to be no way to clear the audio from audio_recorder, even when clearing the caches #7

Closed Adrian-1234 closed 2 months ago

Adrian-1234 commented 1 year ago

Here is my function 👍

def record_audio():

Record audio

audio_data = audio_recorder(pause_threshold=3.0, sample_rate=48_000, icon_size="2x") if audio_data: st.audio(audio_data, format="audio/wav") return audio_data

Each time the code is run it returns the previous recording. Is there any way to clear the recording once it is read ?

Reproducible Code Example: from audio_recorder_streamlit import audio_recorder

def record_audio():

Record audio

audio_data = audio_recorder(pause_threshold=3.0, sample_rate=48_000, icon_size="2x")
if audio_data:
        st.audio(audio_data, format="audio/wav")
return audio_data

Record audio

audio_data = record_audio()

The code runs when the streamlit audio button is pressed. How do I only call record_audio() when a new sound has been recorded or clear the recording so that a null or "" is returned on subsequent reads or a flag tells me whether the record button has been pressed ?

Expected Behavior I feel that a call to audio_recorder() should clear the sound, or there should be a way of detecting when the streamlit audio record button has been pressed.

Perhaps there is but I am missing something ?

Current Behavior: The previous recording is always returned no matter how often this code is called.

Joooohan commented 1 year ago

Hello @Adrian-1234, let me check.

Joooohan commented 1 year ago

So I don't know if I understand your issue. I don't think I am able to reproduce it.

If I run the following code:

import streamlit as st
from audio_recorder_streamlit import audio_recorder

def record():
    audio_bytes = audio_recorder()
    if audio_bytes:
        st.audio(audio_bytes)
    return audio_bytes

audio_data = record()

I have a recorder. Once the recording stops, the audio player is updated with the recorded utterance. If I click again on the recorder, the audio data is updated as well as the audio player.

You would like a way to clear the audio altogether to have a null audio data ?

Adrian-1234 commented 1 year ago

Hi Johan,

You would like a way to clear the audio altogether to have a null audio data ?

Yes, this would be a work around.

Regards, Adrian.

On Mon, 27 Feb 2023 at 19:23, Johan Leduc @.***> wrote:

So I don't know if I understand your issue. I don't think I am able to reproduce it.

If I run the following code:

import streamlit as stfrom audio_recorder_streamlit import audio_recorder

def record(): audio_bytes = audio_recorder() if audio_bytes: st.audio(audio_bytes) return audio_bytes audio_data = record()

I have a recorder. Once the recording stops, the audio player is updated with the recorded utterance. If I click again on the recorder, the audio data is updated as well as the audio player.

You would like a way to clear the audio altogether to have a null audio data ?

— Reply to this email directly, view it on GitHub https://github.com/Joooohan/audio-recorder-streamlit/issues/7#issuecomment-1446925789, or unsubscribe https://github.com/notifications/unsubscribe-auth/APESH7B5FLN6UH55JSSSCMDWZT5J7ANCNFSM6AAAAAAVHWIXFE . You are receiving this because you were mentioned.Message ID: @.***>

B4PT0R commented 1 year ago

Hi,

I'm using your nice component to implement speech to text in my app, and I'm encountering the same issue.

The issue is that every-time the app reruns, the last recorded audio is returned by the component (which is actually a normal streamlit behavior).

So if one implements the following kind of logic:

audio_bytes=audio_recorder()
if audio_bytes:
    #do something

the "do something" part will be executed several times on the same audio, every time the app reruns, after the first recording has taken place, not just once.

This is a common situation when using streamlit, as components are designed to remember their last state across reruns.

Generally the work around is to set a unique key for the component, and renew the key after the output data has been processed once, so that the next rerun will create a new recorder component with its output initialized back to None:

if not 'recorder_key' in st.session_state:
    st.session_state.recorder_key=generate_key() # call your custom unique key generator

audio_bytes=audio_recorder(key=st.session_state.recorder_key)
if audio_bytes:
    #process the audio
    st.session_state.recorder_key=generate_key() # generate another key to renew the component

This approach generally works fine with most streamlit components. But with your component, for some reason, it doesn't work as expected (at least in my case).

A possible workaround would be to make your component output a tuple: (time_stamp, audio_bytes) With a time_stamp generated by the front-end whenever a new recording takes place. This way, this time_stamp could be used to determine if the same audio is received a second time from the component, in order not to trigger the processing logic twice on the same recording.

Joooohan commented 1 year ago

Hello @B4PT0R ,

I think I understand your issue. Indeed, reading the doc I can see that the key argument passed to a widget is supposed to be a reference to the widget's value stored in the session. I don't recall having implemented that.

However, using the widget's key argument is only a shortcut. You could do the same by explicitly saving the audio to a session variable like so:

if not 'audio' in st.session_state:
    st.session_state.audio = None

st.session_state.audio = audio_recorder()

if st.session_state.audio:
    # do something
    st.session_state.audio = None  # reset audio state
B4PT0R commented 1 year ago

Bonjour @Joooohan ,

It's actually trickier than you think. The key isn't only used as a shortcut to access the widget's output data, it looks so ONLY on the python side (backend), but it is also used as a unique identifier to permanently identify the react object's instance on the webpage (front-end). Streamlit's functioning is a bit tricky to grasp, but you may think of it as a turn by turn ping-pong communication between the python backend and the js/react frontend.

To clarify things let's dig the following script based on the example you suggested :

import streamlit as st
from audio_recorder_streamlit import audio_recorder

if 'audio' not in st.session_state:
    st.session_state.audio=None

if 'received_audio' not in st.session_state:
    st.session_state.received_audio = []

st.session_state.audio = audio_recorder(key='recorder')

st.button('click me to rerun without recording',key='button')

if st.session_state.audio:
    st.session_state.received_audio.append("Audio received")
    st.write(st.session_state.received_audio)
    st.session_state.audio = None

Run the above snippet, record a sound, then click on the button several times. You will notice that the list of audio received gets appended each time you click on the button, which means st.session_state.audio is not None, even if we force it back to None at the end of the if statement.

Here is what happens under the hood: You start the script (streamlit run ...).

First loop : (backend)

  1. st.session_state variables are initialized.
  2. running audio_recorder() sends a request to the front-end to instantiate an audio recorder object. None is returned to st.session_state.audio as a default value.
  3. The same happens for the button.
  4. nothing happens in the if statement. The script finishes. Streamlit stops running the python script and starts a listener in a thread to wait for a signal coming back from the front-end.

(front-end)

  1. The request is processed and an audio recorder object is instantiated on the front-end with a 'recorder' key identifier. Same for the button. You may interact with any of them in the browser.
  2. If you don't, the front-end waits until you do.
  3. Assume you record something, the front-end detects a change and returns a signal to the back-end. This signal is roughly a list of pairs (widget_key, output_data) for each widget on the page. So we have ('recorder',some_audio_bytes) for the recorder, and ('button', False) for the button since it hasn't been clicked.
  4. Streamlit's front-end listener receives the signal, and routes it to an internal key:value dictionary so that the python functions (in our case the audio_recorder function and the button function) know what value to return according to their key. All widgets with a state have a key, it's just that if you don't provide any, streamlit generates a default one.

Second loop: (back-end)

  1. state variables are already initialized (to None and [] respectively).
  2. the audio-recorder function returns some_audio_bytes received from the front end, the result is stored in st.session_state.audio. a new request is sent to the front-end to redraw the widget (with the same key!) on next front-end refresh.
  3. the button returns False, and similarly, a new request is being sent to the front-end to redraw the widget (with the same key!).
  4. since the audio is not None, we enter the if statement, the "Audio received" message is appended to the list and the list is printed. st.session_state.audio is forced back to None.
  5. Script finishes and waits for an incoming signal from the frontend.

(front end)

  1. The requests to redraw widgets are processed, but since the widgets have the same key as before, the front-end doesn't re-instantiate new ones, it just refreshes the old ones (what happens during this refresh depends on how the widget front-end are implemented).
  2. You don't record audio this time, but rather click the button, thus setting its state to True.
  3. Since you interacted with a widget, this sends a signal back to the back-end : ('recorder',some_audio_bytes) for the recorder - which is the last audio recorded by the widget on the front end side - and ('button',True) for the button since it was clicked.

Third loop: (back-end)

  1. state variables are None and ["Audio received"] respectively.
  2. the audio-recorder function returns the data received from the front end (some_audio_bytes, again!, not None), the result is stored in st.session_state.audio which is therefore not None.
  3. the button returns True, but we do nothing out of it.
  4. since the audio is not None, we enter the if statement, the "Audio received" message is appended to the list and the list is printed. Thus resulting in ["Audio received", "Audio received"] being printed.

... And so on

Sorry for the lengthy explanation, but I think your cool widget is worth the effort ! ;)

So the problem we have, is that every time an event will force the app to rerun (clicking on a dummy button for instance) the audio will be reprocessed as many times, no matter if we force the audio received or the corresponding state variable back to None after processing it. This is due to Streamlit shenanigans for back-end / front-end communication. That is why I suggested a key change after the audio is processed, as it would tell the front end to instantiate a NEW recorder, and leave the old one behind, thus resulting in a recorded audio being set back to None on the front-end side. For some reason that I don't understand, it doesn't work though...

Time-stamping the recorded audio bytes when it is recorded could be an alternative, so that we could compare the time-stamps of the audio received in order to not trigger the processing a second time if the same audio is coming back from the front-end...

Anyways, thanks for reading this far!

Salut !

dionoid commented 7 months ago

I'm running into the same issue. As a workaround I'm comparing if the hash of the returned audio bytes has changed. This is the example code:

import streamlit as st
from audio_recorder_streamlit import audio_recorder

if 'audio_bytes_hash' not in st.session_state:
    st.session_state.audio_bytes_hash=None

audio_bytes = audio_recorder()
if audio_bytes and hash(audio_bytes) != st.session_state.audio_bytes_hash:
    print("Audio received")
    st.session_state.audio_bytes_hash = hash(audio_bytes)

st.button('click me to test rerun without recording')