Open relust opened 7 months ago
StreamAssist uses default Assist Pipeline component. It has some settings, but I don't really understand them :) https://github.com/home-assistant/core/blob/54d005a3b8a5beaaf912a37b89ceab78694bd9db/homeassistant/components/assist_pipeline/pipeline.py#L447-L457
Also realise that the player has finished playing for all kinds of media player can be a problem.
Assist Microphone addon and wyoming satellite on raspberry py do not have this problem. Wait for awake response to finish playing then start listening. So there is something like that in the code, but we have to figure out where. And on the satellite on the Esp32 it has three levels of end-of-speech detection (Default, Relaxed and Aggressive).
end-of-speech detection
is setting for VoiceCommandSegmenter
. Unfortunately it is not possible to change params for the Pipepeline integration.
I get the idea. I don't know if I'll have time to implement this.
I found a possible solution to this problem:
stream_assist/core/_init_.py
set the wake sound playback to WAKE_WORD_END
instead of STT_START
because if I want it to start a continuous conversation I don't want it to say the same thing as when I call by name.
await asyncio.sleep(0.1)
then to put a blocking pause, to stop all the code until awake sound finishes playing, with time.sleep()
# 2. Setup Pipeline Run
#...
if event.type == PipelineEventType.WAKE_WORD_END:
if player_entity_id and (media_id := data.get("stt_start_media")):
# We schedule the execution of the asynchronous function in the background
asyncio.create_task(async_play_media_and_pause(hass, player_entity_id, media_id))
#... at the bottom of the script
async def async_play_media_and_pause(hass, player_entity_id, media_id):
play_media(hass, player_entity_id, media_id, "audio")
await asyncio.sleep(0.1) # We add an asynchronous pause of 100 ms
time.sleep(5) # We add a blocking pause that can be adjusted to how long the awake sentence is
Block loop is very bad idea. You are blocking whole Hass.
I know what can be done. I can stop forwarding audio stream from source to pipeline for some time
I didn't think that it blocks whole Hass. Anyway, it doesn't really work because, I don't know why it starts recording as soon as the wake word is detected, then blocks and delays the VAD and doesn't recognize the commands. Stopping audio stream forwarding would be a much better solution.
@AlexxIT please can you find a solution to this problem because I want to add visual responses instead of beeps in this integration and if I don't solve the problem with activate mute or delay listening I can't use such responses because it records them and no longer recognize commands.
I don't have time for this in near future
I added a browser mod popup with a gif and I need the player status to close the popup when the response finishes playing , but I'm not getting the "player_entity_id" from the args. @AlexxIT can you tell me how I could do it.
elif event.type == PipelineEventType.TTS_END:
if player_entity_id:
tts = event.data["tts_output"]
play_media(hass, player_entity_id, tts["url"], tts["mime_type"])
if player_entity_id and (media_id := data.get("speech_gif")):
show_popup(hass, player_entity_id, media_id, "picture", browser_id)
if player_entity_id:
asyncio.create_task(async_delay_close_popup(hass, player_entity_id, browser_id))
######################################################
async def async_delay_close_popup(hass, player_entity_id, browser_id):
await asyncio.sleep(1)
while True:
player_state = hass.states.get(player_entity_id).state
if player_state == "idle":
break
await asyncio.sleep(0.1)
close_popup(hass, player_entity_id, browser_id)
##################################################
def close_popup(hass: HomeAssistant, player_entity_id: str, browser_id: str):
service_data = {
"entity_id": player_entity_id,
"browser_id": browser_id,
}
coro = hass.services.async_call("browser_mod", "close_popup", service_data)
hass.async_create_background_task(coro, "stream_assist_close_popup")
If I use the name of the player directly, it works, but not when I want to take it from args
player_state = hass.states.get("media_player.ha_display2_browser").state
I'm not sure what args you talking about. I have never used browser mod. Don't understand your code.
I just need to import the name of the player that is selected in the gui that the responses are playing on to set the popup to close when the response is done playing.
I need to replace the name of the player that I put directly in the code and it works with, player_state = hass.states.get("media_player.ha_display2_browser").state
, with the name of the player set in the graphic interface so that the player selector can work player_state = hass.states.get(player_entity_id).state
I don't know why it doesn't import the name of the player or maybe it doesn't import it in a format that works in this template. player_entity_id
is imported from function arguments (hass, player_entity_id, media_id, "picture")
I don't understand from what place your trying to get player_entity_id
var.
Did this ever get taken care of? I noticed teh VAD is way too aggressive as well and depending on how quickly the mp3 you play during start media vad is already over and the conversation agent cancels the request.
Hello. Great job. I was waiting for the wake word for Stream Assist and I'm glad you managed to do it. My problem is that for "STT start media" I want to use personalized random answers like ”yes, i m listening”, ”how can I assist you” etc. and, because VAD is too aggressive, it also records part of the answer ”yes , i m listening” reason for which it gives an error response, that it did not understand the request. I tried an automation so that when it detects the wake word it turns off the microphone switch for a second and then turns it on again, but it doesn't start listening again. Can you make it possible to set a delay between wake word detection and STT listening?