NabuCasa / hass-nabucasa

Issues related to the cloud integration in Nabu Casa
GNU General Public License v3.0
171 stars 56 forks source link

Frequent Nabu Cloud timeouts with both Alexa and Google Home HA Skills #298

Closed hugalafutro closed 2 years ago

hugalafutro commented 2 years ago

As discussed @ https://community.home-assistant.io/t/alexa-home-assistant-skill-slow-timing-out/321257 I am not alone experiencing this, which should rule out some esoteric issue native to just my setup/isp/etc. The issue in summary is that I ask voice assistant to do something and then voice assistant will say it failed, but the requested action happens anyway. My network although not fastest, is rock-solid latency wise (household partakes in online gaming and we'd notice any disconnects or high latency straight away),

With Alexa she will take longer than usual, then say '"Device name" is not responding check its internet connectivity' or something like that, and usually while she is saying this whatever I asked triggers and starts happening 8 times out of ten, 2 times it just fails altogether. As mentioned in the thread you can often see ha devices go "offline" in the Alexa app and then come back "online" when refreshed.

To me it looked like communication between amazon and nabu.casa is timing out. I ordered and set up Google Home Mini to check if non-amazon device will have same problem, and sadly it has exactly same issue.

With Google Home she will take longer than usual and then say 'Sorry, I couldn't reach the Home Assistant Cloud by Nabu Casa.' or 'Sorry, it looks like the Home Assistant Cloud by Nabu Casa is unavailable right now.' and then like 2 times out 10 it will happen regardless, but definitely will actually fail more often than Alexa does.

I tried changing APs, changing from 5GHz to 2.4GHz wifi, closer to AP, nothing helps. I have 35 devices/scripts/sensors exposed to both Alexa and Google Home, is that too much? This wasn't always happening, but it's hard to pinpoint exactly when it started, it sort of creeped up, happened once a month, then once a week, once a day and nowadays it's few times a day.

hugalafutro commented 2 years ago

It doesn't sound like that to me. If that were the case then most request would fail. This sounds more like some kind of initial timeout issue as only the first request fails and subsequent requests are fine.

Read comment https://github.com/NabuCasa/hass-nabucasa/issues/298#issuecomment-1002195808 It is nabu infrastructure.

hugalafutro commented 2 years ago

Has anyone from Nabu commented on this yet?

yes, read https://github.com/NabuCasa/hass-nabucasa/issues/298#issuecomment-1002195808

Rekazm commented 2 years ago

Not seeing this I raised a case with Nabu, can't wait until its fixed.

pergolafabio commented 2 years ago

I hope it's fixed soon, its not that cheap

Rekazm commented 2 years ago

You and I both!

From: pergolafabio @.> Sent: 07 January 2022 15:59 To: NabuCasa/hass-nabucasa @.> Cc: Rekazm @.>; Comment @.> Subject: Re: [NabuCasa/hass-nabucasa] Frequent Nabu Cloud timeouts with both Alexa and Google Home HA Skills (Issue #298)

I hope it's fixed soon, its not that cheap

— Reply to this email directly, view it on GitHub https://github.com/NabuCasa/hass-nabucasa/issues/298#issuecomment-1007522352 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AITP7ZTOQN6H55Y2IY56WCDUU4EUVANCNFSM5JUF7CTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . You are receiving this because you commented. https://github.com/notifications/beacon/AITP7ZURPR3UKLIWCXHH5X3UU4EUVA5CNFSM5JUF7CT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHQGZEMA.gif Message ID: @. @.> >

trashhalo commented 2 years ago

There are two issues here.

  1. the technical problem degrading home assistant experience on smart speakers. Im confident this will be resolved.
  2. the lack of communication from nabu.

The fact that we are piecing together this story based on peoples support tickets is unacceptable for a paid product. Whatever happens here we need more proactive comms in the future.

cogneato commented 2 years ago

Hey there,

I answer the support tickets at Nabu Casa and saw that my response to someone was already shared here. It is still the same response: The relayer is being rewritten. After this is done the next step will be to add geo specific instances of the relayer for users to connect to which will hopefully improve the situation with the responses.

There is also the billing system revamp taking place, and the year's end to see to, along with the usual Home Assistant project releases, so we are probably looking at anywhere from now through March for everything to finalize and be released.

There isn't much more to provide here, as it is a rewrite and is not deployed and tested yet, so this GitHub issue doesn't technically apply to the rewrite.

pergolafabio commented 2 years ago

March?? Serious?? Info, we pay a lot of money to use this google stuff :-(

Rekazm commented 2 years ago

@cogneato Feels like this should be prioritized along side BAU tasks such as year end. This is your paid service not your FOSS element of the project. In the interim can you not just beef the servers up?

Gortosch commented 2 years ago

Come on people, we don't just pay for the smart speaker service. We also want to support the team behind Homeassistant. Or did you pay a penny for Homeassistant and the great work? But I'm already looking forward to March, because of course it's annoying me too.

pergolafabio commented 2 years ago

Come on people, we don't just pay for the smart speaker service. We also want to support the team behind Homeassistant. Or did you pay a penny for Homeassistant and the great work? But I'm already looking forward to March, because of course it's annoying me too.

i know, but the words "March" & "hopefully" doesnt sound verry promising

cogneato commented 2 years ago

@pergolafabio @Rekazm I'm including all I mentioned in that time frame. Relayer re-write is something that was decided on and worked on over the holidays. It could come sooner.

pergolafabio commented 2 years ago

Ok, let's hope, it's verry frustrating... Thnx for feedback!

mikeneiderhauser commented 2 years ago

Do we know why this started happening all of a sudden? What was the regression? Can anything be done in the interim to patch this so we all don't suffer from these timeouts?

gregsheremeta commented 2 years ago

It's been happening for me since I started using nabu casa over a year ago. You're lucky if it just recently started happening for you!

balloob commented 2 years ago

We have been improving the relayer infrastructure and it's speed. The rewrite should give it a significant boost.

I think this issue is becoming a catch-all for various issues, all with the same symptoms (timeout). Note that regressions can happen not just because of us, but also because Alexa/Google mess things up.

For Google

Google can talk locally to your Home Assistant instance if their devices can find Home Assistant via mDNS/zeroconf discovery. This integration is enabled in Home Assistant by default. Most commands from Google will then hit Home Assistant locally and bypass our infrastructure (commands for devices with 2FA or sync's still go via cloud). You can see in your logbook if a command came in locally:

image

We have also included a fix in 2021.12 to make Google faster:

Google requires us to return the new states in the response to them. This means that we need to wait until all work is done. If you're activating a scene or controlling a group, this can take a while before the integration you're controlling is done. We only have ~7 seconds before getting a timeout from Google.

If you have report state enabled, we will no longer wait for the results to come back in but just confirm the command to Google. With report state enabled, states will be reported to Google as soon as entities change. This violates the Google spec but seems to function.

So if you are experiencing Google timeouts, enable report state.

For Alexa

Alexa was always only requiring confirmation of receiving the command and not the new states and has no local SDK.

We're still looking into why this would happen. First command in a while taking longer means it's refreshing the access token. We use AWS Cognito for auth, so basically it's Amazon getting access tokens from Amazon to talk to our Alexa lambda hosted on Amazon 🤷

We hope that with a faster relayer we can further reduce our overhead for Alexa, making it less likely to hit timeouts.

pergolafabio commented 2 years ago

For me it started slowly around end november, before it was just sometimes

WhimsySpoon commented 2 years ago

For me it started slowly around end november, before it was just sometimes

Same. It's now nearly every request via Alexa, even subsequent ones.

@balloob Is there anything we can do to assist the diagnosis? Provide local logs, for example?

pergolafabio commented 2 years ago

hey @balloob , thnx for feedback i had turned off enable state reporting, in the hope it would help

image

but how do you enable that local google? my devices are on same network, and they are also picked up with cast integration for example, but i never see that (via local), like in your screenshot

i do see this :

image

balloob commented 2 years ago

@pergolafabio make sure your devices are on the same network and UDP broadcasts work. Restarting the Google device will rescan it.

pergolafabio commented 2 years ago

yes, everything is on same local subnet, just enabled back again state reporting, but i dont see that "local"

image

i see that strange exclamination mark

google devices reboot every night anyway what is that integration you are talking about?

pergolafabio commented 2 years ago

not sure what i need todo/check to enable that local google assistant, sounds verry interesting!! also i see everytime that exclamination mark, not sure why

zeroconf: & dicovery: & ssdp: are also enabled in my yaml

hugalafutro commented 2 years ago

I don't really mind the delays as much as the timeouts and then it being a flip of a coin whether it will actually happen or not. I don't mind waiting for the infrastructure upgrade hoping it helps.

I've already tried the echo/gh on separate wifi, same wifi that's bridged to local lan on router, literally the same wifi with passing a wifi adapter to the ha vm instead of eth. All of that with and without state reporting. Each combination for few days. I've used both of the apps throughout that (alexa mainly until I got gh for testing).

And I know I don't have any hard data to show for all of that, but for me the issue is consistent in all cases I tested and progressively getting worse over period of months.

cogneato commented 2 years ago

@pergolafabio The icon will be replaced with another soon.

pergolafabio commented 2 years ago

ah ok, its just an icon :-)

now, how to enable that local google assistant, or why isnt using local, how can i investigate this? really need this :-)

pergolafabio commented 2 years ago

Ok, I was looking it GitHub, seems it's quite new, not merged yet ? Also only for local component? Also for nabucasa?

https://github.com/home-assistant/core/pull/63218

zididadaday commented 2 years ago

I just recently installed Home Assistant as an alternative to my Fibaro HC2 to give my self more control over entities going to Google Home. Really happy with it except for one issue: the current state after triggering a state change isn’t immediately shown in the Google Home app. I saw @balloob mentioned how this works above and I can’t get that to add up with my experience. I don’t believe it was like this with my old gateway.

  1. Open the Google Home app and be at the Home Screen
  2. Turn on or off a device
  3. Note the device state (icon colour) doesn’t change, but the action does happen.

Is this how it is for everyone?

request_state is enabled and I’ve tried both Nabu Casa and the [test] Google setup. HA 2021.12.8 and Google Home app on an iOS device. Lights in HA are from a Fibaro HC2. I have a video, on first press the light is turning off, on second press only action is that the state updates. Video here.

pergolafabio commented 2 years ago

hey @balloob , created a seperate issue here, can you help us how to enable "local" or troubleshoot why it doesnt use local?

thnx in advance, appreciated!!

https://github.com/home-assistant/core/issues/63679

PS: is for anyone here "local" working?

B-Kramer commented 2 years ago

hey @balloob , created a separate issue here, can you help us how to enable "local" or troubleshoot why it doesn't use local?

thnx in advance, appreciated!!

home-assistant/core#63679

PS: is for anyone here "local" working?

It works for me, but it looks strange. This is a request to a google home to adjust the brightness and temperature of 3 lights (all zigbee through ZHA). I asked it to "set the office lamps to 100%", then I asked "set the office lamps to daylight". I got the "sorry" error on both. The first request is local, then I get two cloud requests: image

pergolafabio commented 2 years ago

hmm, why dont i see that "via local"

i also see this : https://github.com/NabuCasa/home-assistant-google-assistant-local-sdk

did you setup that too ?

also, it you see "via local", that means you dont have issues with the timeouts then?

B-Kramer commented 2 years ago

hmm, why dont i see that "via local"

i also see this : https://github.com/NabuCasa/home-assistant-google-assistant-local-sdk

did you setup that too ?

also, it you see "via local", that means you dont have issues with the timeouts then?

I get the "sorry, I couldn't reach the home assistant cloud by nabu casa error". That's why I'm here. it DOES work, but I get the verbal warning from the google home. it is only setup through Nabu. I just tried it with a single light, and I get this: image

pergolafabio commented 2 years ago

i dont get it

B-Kramer commented 2 years ago

i dont get it

me either.

pergolafabio commented 2 years ago

maybe in your case it "tries" to todo it locall, but if fails and then tries it with cloud? maybe we need this : https://github.com/NabuCasa/home-assistant-google-assistant-local-sdk must be a reason why that project is on nabucasa?

so unclear

B-Kramer commented 2 years ago

maybe in your case it "tries" to todo it locall, but if fails and then tries it with cloud? maybe we need this : https://github.com/NabuCasa/home-assistant-google-assistant-local-sdk must be a reason why that project is on nabucasa?

so unclear

Yeah something is weird, no reason for 3 requests for a single light turn on, and 2 of them happen AFTER HA knows the light is on.

pergolafabio commented 2 years ago

maybe the other command is somekind of state reporting?

B-Kramer commented 2 years ago

Maybe. I do have state reporting enabled. I'm trying everything, I disabled ipv6, removed the other user, relinked my account. I'm going to let things settle down and see if it helps.

On Sat, Jan 8, 2022, 8:27 AM pergolafabio @.***> wrote:

maybe the other command is somekind of state reporting?

— Reply to this email directly, view it on GitHub https://github.com/NabuCasa/hass-nabucasa/issues/298#issuecomment-1008004999, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMDZCGW2TJFIB5PLDMM55LUVBCWBANCNFSM5JUF7CTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

balloob commented 2 years ago

@B-Kramer it looks in your case that it tries a local request, executes that, but somehow Google thinks it went wrong and sends it via the cloud. Could you open a new issue, tag me and include the actual messages send/received by Google by enabling debug logging for homeassistant.components.nabucasa and homeassistant.components.google_assistant?

Mariusthvdb commented 2 years ago

since this hasn't been mentioned yet: could a set of aliases in cloud: configuration have anything to do with this? Experiencing this same issue on an intensified level lately, I've been wondering if using a double language, or maybe I should say mixed language cause havoc.

      light.spots_on_the_wall:
        name: Den
        aliases:
          - Wandverlichting
          - Spots on the wall
        room: Woonkamer

Maybe 'it' needs to lookup first and then relay, which causes an unwanted delay? Having set the google devices to English, but speaking in Dutch (via the aliases) does work ok-ish.

also, since Ive never seen the (via local) either: could it have to do with the fact an instance is run remotely via duckdns?

B-Kramer commented 2 years ago

@B-Kramer it looks in your case that it tries a local request, executes that, but somehow Google thinks it went wrong and sends it via the cloud. Could you open a new issue, tag me and include the actual messages send/received by Google by enabling debug logging for homeassistant.components.nabucasa and homeassistant.components.google_assistant?

I enabled debugging, but I'm not home right now and I assume you want some google assistant requests, so I'll do that when I get home and get the logs into a new issue.

Puntoboy commented 2 years ago

I'm glad it's not just me that has noticed this. For the last several months, maybe a year, we have to constantly repeat commands to Alexa as she said they are not responding. It used to work really well but I'm not sure when the issue started.

All devices respond instantaineously through the app/lovelace or where direct contact occurs (say with a Zigbee button). It's only voice commands that are the problem.

briodan commented 2 years ago

Same issues here. Been switching my zwave and zigbee devices from SmartThings to HA and I’m seeing this issue for devices I’m moving over.

the Alexa <> SmartThings integration has not given me any issues over the same period.

lweberru commented 2 years ago

Same issue here. Already massivly reduced the amount of published devices to alexa, hoping this would help. But it did not.

gregsheremeta commented 2 years ago

Not sure I called it out above. I've experienced this for a year on Alexa. I tried everything. The only thing that helped was switching to Google Home/Nest. I never see the first call slowness with Google. Google isn't perfect, but it's 95% better for me. I get a failure once every two weeks vs. several per day with Alexa.

aclock81 commented 2 years ago

Hi, since 6 months I'm facing problems with Alexa and Nabu Casa. Problems are day by day more frequent and since last week Alexa is unusable...only errors and huge delays. Dear @balloob, on this link you can find many many users in the same frustrating situation with respective wives ready to kill them... :) :)

PLEASE PLEASE PLEASE do something to solve this once for all. Thanks and kind regards

pergolafabio commented 2 years ago

hey @balloob , untill nabucasa issues are resolved, i delinked the smart home app and used the custom google assistant component... and used the nabu casa urls for the authorization , so no need for port forwarding... Much faster, everything instant, did about 30 tests

Gonna leave it like this untill nabu casa is fixed..

I also enabled state reporting in the yaml file I also uploaded the java file in my project, according to this manual : _home-assistant._tcp.local

but i dont see that 'local" stuff you made a screenshot earlier ? is this not implemented yet maybe? running 2012.12.9 now

image

briodan commented 2 years ago

I did a bit of experimenting/checking yesterday on my Alexa issues and here are some preliminary points.

Going on the assumption that there is some sort of timeout in the Alexa to Nabu connection I decided to setup a way to force a keep alive. I did this by setting up a template lock in HA based on an input boolean. I synced the lock to Alexa and setup a routine on the Alexa side to lock the template lock whenever its unlocked. On the HA side I setup an automation running every 5 minutes that unlocks the template lock. Here is a log snipet of that. Note I built a failsafe in the HA automation to lock the lock if is unlocked, as I saw misses in the lock action from Alexa, example below. image

as you can see at 10:55am the HA automation unlocks the lock, followed by a lock command coming from Alexa via nabu. At 11:00am the same action occurs, at 11:05 however the automation triggers the unlock but there is no response back from Alexa to lock, causing the failsafe to lock action to run at 11:10. If I look in the Alexa logs its shows having executed the routine at 11:05 but the message never got to my HA instance.

The other piece that I figure might be interesting to share is the log/event sequence for our bedroom light. Which is always one that causes us the most aggravation when we try to turn if off. This is what happened last night. On the alexa side the Master Lamp and the echo are part of the same group. The master lamp is a zwave plug connected to HA via "Z-wave JS to MQTT".

This is what the logbook log looks like for that time period

image

I have no idea why the Turn on events are shown at 11:23:29 pm and 11:23:31 pm the light did not flicker etc. it was just on. The keep-alive lock automation executed correctly at 11:20pm and 11:25pm, so no timeout like above was noted. It could be that the commands took longer to execute then we expected and by triggering the command again we confused Alexa causing the extra actions/logs. image

I have not gone into the full logs to pull further details for what I saw in the logbook logs yet, task for later tonight if i have time.

And one other quick update this the failure profile over the last 24 hour image

the blue stripes are lock stayed open intervals, by my count 27 missed lock commands out of ~280, so a 10% failure rate.

lweberru commented 2 years ago

I also invested time to find workarounds. Tried to reduce the amount of published entities, established a wakeup/keep-alive. But as more I do the worser the situation gets. I think I really need to disable it and use a custom way or so

pergolafabio commented 2 years ago

For the time being, I have setup the custom googlz assistant, by creating an project... I use the nabu url for auth, so no port forwarding needed...

Works perfect now

gregsheremeta commented 2 years ago

google ... so no port forwarding needed

can you elaborate? Are you still using full Nabu Casa or something else?