AlexxIT / StreamAssist

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant
MIT License
208 stars 19 forks source link

Add delay until STT start media finishes playing #11

Open relust opened 7 months ago

relust commented 7 months ago

Hello. Great job. I was waiting for the wake word for Stream Assist and I'm glad you managed to do it. My problem is that for "STT start media" I want to use personalized random answers like ”yes, i m listening”, ”how can I assist you” etc. and, because VAD is too aggressive, it also records part of the answer ”yes , i m listening” reason for which it gives an error response, that it did not understand the request. I tried an automation so that when it detects the wake word it turns off the microphone switch for a second and then turns it on again, but it doesn't start listening again. Can you make it possible to set a delay between wake word detection and STT listening?

AlexxIT commented 7 months ago

StreamAssist uses default Assist Pipeline component. It has some settings, but I don't really understand them :) https://github.com/home-assistant/core/blob/54d005a3b8a5beaaf912a37b89ceab78694bd9db/homeassistant/components/assist_pipeline/pipeline.py#L447-L457

Also realise that the player has finished playing for all kinds of media player can be a problem.

relust commented 7 months ago

Assist Microphone addon and wyoming satellite on raspberry py do not have this problem. Wait for awake response to finish playing then start listening. So there is something like that in the code, but we have to figure out where. And on the satellite on the Esp32 it has three levels of end-of-speech detection (Default, Relaxed and Aggressive).

AlexxIT commented 7 months ago

end-of-speech detection is setting for VoiceCommandSegmenter. Unfortunately it is not possible to change params for the Pipepeline integration.

https://github.com/home-assistant/core/blob/2f026ca9631d13bf3e04349dfc27909105977e9f/homeassistant/components/assist_pipeline/vad.py#L118-L119

https://github.com/home-assistant/core/blob/2f026ca9631d13bf3e04349dfc27909105977e9f/homeassistant/components/assist_pipeline/vad.py#L14-L31

AlexxIT commented 7 months ago

I get the idea. I don't know if I'll have time to implement this.

relust commented 7 months ago

I found a possible solution to this problem:

AlexxIT commented 7 months ago

Block loop is very bad idea. You are blocking whole Hass.

I know what can be done. I can stop forwarding audio stream from source to pipeline for some time

relust commented 7 months ago

I didn't think that it blocks whole Hass. Anyway, it doesn't really work because, I don't know why it starts recording as soon as the wake word is detected, then blocks and delays the VAD and doesn't recognize the commands. Stopping audio stream forwarding would be a much better solution.

relust commented 7 months ago

@AlexxIT please can you find a solution to this problem because I want to add visual responses instead of beeps in this integration and if I don't solve the problem with activate mute or delay listening I can't use such responses because it records them and no longer recognize commands.

AlexxIT commented 7 months ago

I don't have time for this in near future

relust commented 6 months ago

I added a browser mod popup with a gif and I need the player status to close the popup when the response finishes playing , but I'm not getting the "player_entity_id" from the args. @AlexxIT can you tell me how I could do it.

        elif event.type == PipelineEventType.TTS_END:
            if player_entity_id:
                tts = event.data["tts_output"]
                play_media(hass, player_entity_id, tts["url"], tts["mime_type"])
            if player_entity_id and (media_id := data.get("speech_gif")):
                show_popup(hass, player_entity_id, media_id, "picture", browser_id)
            if player_entity_id:
                asyncio.create_task(async_delay_close_popup(hass, player_entity_id, browser_id))

######################################################              

   async def async_delay_close_popup(hass, player_entity_id, browser_id):

    await asyncio.sleep(1)

    while True:
        player_state = hass.states.get(player_entity_id).state
        if player_state == "idle":
            break 

        await asyncio.sleep(0.1)

    close_popup(hass, player_entity_id, browser_id)

##################################################   
    def close_popup(hass: HomeAssistant, player_entity_id: str, browser_id: str):
    service_data = {        
        "entity_id": player_entity_id,
        "browser_id": browser_id,
    }

    coro = hass.services.async_call("browser_mod", "close_popup", service_data)
    hass.async_create_background_task(coro, "stream_assist_close_popup")

If I use the name of the player directly, it works, but not when I want to take it from args player_state = hass.states.get("media_player.ha_display2_browser").state

AlexxIT commented 6 months ago

I'm not sure what args you talking about. I have never used browser mod. Don't understand your code.

relust commented 6 months ago

I just need to import the name of the player that is selected in the gui that the responses are playing on to set the popup to close when the response is done playing. I need to replace the name of the player that I put directly in the code and it works with, player_state = hass.states.get("media_player.ha_display2_browser").state, with the name of the player set in the graphic interface so that the player selector can work player_state = hass.states.get(player_entity_id).state I don't know why it doesn't import the name of the player or maybe it doesn't import it in a format that works in this template. player_entity_id is imported from function arguments (hass, player_entity_id, media_id, "picture")

AlexxIT commented 6 months ago

I don't understand from what place your trying to get player_entity_id var.

janstadt commented 3 weeks ago

Did this ever get taken care of? I noticed teh VAD is way too aggressive as well and depending on how quickly the mp3 you play during start media vad is already over and the conversation agent cancels the request.