NabuCasa / hass-nabucasa

Issues related to the cloud integration in Nabu Casa
GNU General Public License v3.0
167 stars 57 forks source link

Frequent Nabu Cloud timeouts with both Alexa and Google Home HA Skills #298

Closed hugalafutro closed 2 years ago

hugalafutro commented 2 years ago

As discussed @ https://community.home-assistant.io/t/alexa-home-assistant-skill-slow-timing-out/321257 I am not alone experiencing this, which should rule out some esoteric issue native to just my setup/isp/etc. The issue in summary is that I ask voice assistant to do something and then voice assistant will say it failed, but the requested action happens anyway. My network although not fastest, is rock-solid latency wise (household partakes in online gaming and we'd notice any disconnects or high latency straight away),

With Alexa she will take longer than usual, then say '"Device name" is not responding check its internet connectivity' or something like that, and usually while she is saying this whatever I asked triggers and starts happening 8 times out of ten, 2 times it just fails altogether. As mentioned in the thread you can often see ha devices go "offline" in the Alexa app and then come back "online" when refreshed.

To me it looked like communication between amazon and nabu.casa is timing out. I ordered and set up Google Home Mini to check if non-amazon device will have same problem, and sadly it has exactly same issue.

With Google Home she will take longer than usual and then say 'Sorry, I couldn't reach the Home Assistant Cloud by Nabu Casa.' or 'Sorry, it looks like the Home Assistant Cloud by Nabu Casa is unavailable right now.' and then like 2 times out 10 it will happen regardless, but definitely will actually fail more often than Alexa does.

I tried changing APs, changing from 5GHz to 2.4GHz wifi, closer to AP, nothing helps. I have 35 devices/scripts/sensors exposed to both Alexa and Google Home, is that too much? This wasn't always happening, but it's hard to pinpoint exactly when it started, it sort of creeped up, happened once a month, then once a week, once a day and nowadays it's few times a day.

tomachristian commented 2 years ago

Happens to me as well with Alexa. Would be nice to have a way to debug this. Had no problems with Alexa and Nabu Casa until 1-2 months ago.

z3r0bytes commented 2 years ago

same problem here. Started about a few months ago

acrelle commented 2 years ago

Similar issue here. The only difference I have to add is that it typically only occurs from 'cold', i.e. when I haven't used any Alexa to control Home Assistant or a period of time. When it is 'warm', lights can be turned on and off immediately.

This is for a simple use case - turning a Home Assistant light on and off by voice, exposed via the Alexa integration for Home Assistant Cloud.

I can't think of anything I can change on my Home Assistant side to find out the root cause of this. I've not noticed any local lag.

Edit: Occasionally on loading the /config/info page I am getting timeout returned to home assistant cloud... (from the websocket)

{
  "id": 70,
  "type": "event",
  "event": {
    "type": "update",
    "domain": "cloud",
    "key": "can_reach_cloud",
    "success": true,
    "data": {
      "type": "failed",
      "error": "timeout"
    }
  }
}

Sometimes it times out. Sometimes it returns ok. No errors in the logs. Remote Server | eu-west-2-1.ui.nabu.casa

villersfr commented 2 years ago

I confirm i face the same "slow" or non responsive behavior when using alexa trough Nabu Casa since few weeks

BiasF commented 2 years ago

I have the same issue....

sammy1421s commented 2 years ago

same problem here.

cunninr2 commented 2 years ago

Yep. Same problem. Have a fast network. Wondering if Amazon expects super fast response from a third party like home assistant and it not always making the response envelope. I see similar behaviour when querying status of home assistant switches in the Alexa app.

marcodutto commented 2 years ago

Same here. Gigabit fiber optic internet connection with wired (not wireless) HA server. Super fast and precise HA responding on LAN. When asking alexa to control something on HA first time 90% chance that go timeout. Usually only first time, if I say a second order it works good. As if after a while the connection Nabu-Alexa or Nabu-MyHA go lost and need to be established again.

xZetsubou commented 2 years ago

I thought my internet is slow so I kept change network in Alexa, I'm new to HA world, I also the tried cloud. ( since I really Liked HA and I want to support them for this great work <3 ). The response through Alexa is so slow when I asked to turn off the light or shutdown my PC sometimes it took 5-8 secs to do that. Hope it got fixed so soon at least before Trail expired so I can have the best experiment

obliojoe commented 2 years ago

Same issue here (using Alexa). Seems to have gotten much worse recently but can't say for sure exactly when it started.

acrelle commented 2 years ago

Has anybody turned on additional logging, or can suggest what to turn on, to get to the bottom of the responsiveness being too slow for Alexa?

FreshEire commented 2 years ago

I have the same issue. I'm guessing the IP for Nabu Casa is 99.86.125.150 maybe someone can confirm?

When I perform a voice command I can see 22-23 packets exchanged between that IP and my HA instance (running on Ubuntu on Pi4)

The exchange is rapid and occurs in just a few ms which leads me to believe the issue might be between Amazon server and Nabu Casa server.

I am in Europe and the the above IP seems to be in the US. Logging into account.nabucasa.com directs me to servers in Germany. Also, remote UI is processed through a London IP. I would expect voice commands via Alexa would be sent to a regional server somewhere in Europe and then handed off to Nabu Casa server in Europe, is this not the case? maybe I am way off here but anyway, maybe this might help.

You can observe the delay by watching the exchange from HA server after initiating a voice command: sudo tcpdump -tttt -i wlan0 | egrep 99.86.125.150

Also, debug logging for nabu casa can be enabled and monitored for inbound activity. This also shows the same delayed result as can be seen using tcpdump.

Add the following to configuration.yaml

logger:
  default: warning
  logs:
    # Nabu Casa debug logging:
    hass_nabucasa: debug

Tail the log and initiate voice command: tail -f home-assistant.log | egrep "'handler': 'alexa'"

mikeneiderhauser commented 2 years ago

+1 Noticed this issue as well. It doesn't appear to matter what type of device either. I have Hue bulbs, wifi smart outlets and a zwave network through zwave js and all exhibit the same slowness issue

hugalafutro commented 2 years ago

I have no basis for this other than gut feeling, but I do not think there is anything going on between: our ha <-> nabu (voice command would not reach nabu, nabu would not talk to alexa/gh and no voice response would be generated) nor between voice assistant <-> internet (they both have their own way of saying they have trouble connecting to internet)

I believe it's timing out between nabu <-> and google/amazon, as google will actually say it cannot contact nabu casa cloud, whereas alexa just uses some generic connection error line. So my layman understanding of what is happening is something like this:

  1. talk to alexa
  2. alexa triggers home assistant skill
  3. alexa home assistant skill: a) contacts ha to trigger device or get reading or w/e b) contacts alexa back to confirm new state of device or pass on the number etc.
  4. if all is working as intended, what was said happens and alexa chimes confirmation sound (if enabled)
  5. if 3.b) fails, the action still happens (presumably confirming ha <-> nabu connection works ok), but since alexa doesn't get back confirmation from nabu she'll say the action was not able to finish.
BiasF commented 2 years ago

Hi,

I think it's not a problem on Nabucasa Cloud. I see the problem also in case of the usage of the HUE Emulation, together with Alexa und also if I use the Homekit Integration wit Siri.

I have only Zigbee2Mqtt devices, maybe the problem is on this side? Maybe the communication between HASS and Z2M?

Would be nice if there where a way to create a trace of a Alexa request.

Bascht74 commented 2 years ago

I have the same problem here...

majkers commented 2 years ago

Same problem here with google home mini. 'Sorry, I couldn't reach the Home Assistant Cloud by Nabu Casa.' with HA service being fired eventually in the background... Nothing in HA logs... If it works it seems like taking ages. I have latest HASS OS on RPi3 with latest HA 2021.12

tonka3000 commented 2 years ago

I have the same problem. Google Home don't connect to nabu casa at all so I switched to Alexa and it is very often slow.

HA 2012.12.6

pergolafabio commented 2 years ago

For me the same... I also have the feeling when I do it from a cold start, then the error comes , if I do it again afterwards, no error...

Allthough when the error comes , the service is executed...

Verry annoying... And I have the feeling it's getting worse then a few months ago...

mikeneiderhauser commented 2 years ago

Is there anything we can do to help grab debug logs to help move this ticket along?

pergolafabio commented 2 years ago

i just turned on debugg, i turned a light off then on First command => error , HA is not available blabla , but command was fired... Second command => no error

nothing strange to se in log:

2021-12-28 17:40:45 DEBUG (MainThread) [hass_nabucasa.iot] Received message:
{'handler': 'google_actions',
 'msgid': '9751313797810680ee77582e347b67ae',
 'payload': {'inputs': [{'context': {'locale_country': 'BE',
                                     'locale_language': 'nl'},
                         'intent': 'action.devices.EXECUTE',
                         'payload': {'commands': [{'devices': [{'customData': {'baseUrl': 'https://xxx.ui.nabu.casa',
                                                                               'httpPort': 8123,
                                                                               'httpSSL': False,
                                                                               'proxyDeviceId': '40902f43-31b8-4ccc-ae62-026ab8f3dc4d',
                                                                               'uuid': 'b436ccf322f94b788b7e094d302fbdad',
                                                                               'webhookId': '2338d36ab1868711dd75786113c6dd01b5b806a42bda73719309742a20f2fa31'},
                                                                'id': 'light.badkamer'}],
                                                   'execution': [{'command': 'action.devices.commands.OnOff',
                                                                  'params': {'on': False}}]}]}}],
             'requestId': '2738648545395519303'}}

2021-12-28 17:40:45 DEBUG (MainThread) [hass_nabucasa.iot] Publishing message:
{'msgid': '9751313797810680ee77582e347b67ae',
 'payload': {'payload': {'commands': [{'ids': ['light.badkamer'],
                                       'states': {'on': True, 'online': True},
                                       'status': 'SUCCESS'}]},
             'requestId': '2738648545395519303'}}

2021-12-28 17:40:55 DEBUG (MainThread) [hass_nabucasa.iot] Received message:
{'handler': 'google_actions',
 'msgid': '85ac4f5f77390f844344a0a2e5a553cc',
 'payload': {'inputs': [{'context': {'locale_country': 'BE',
                                     'locale_language': 'nl'},
                         'intent': 'action.devices.EXECUTE',
                         'payload': {'commands': [{'devices': [{'customData': {'baseUrl': 'https://xxxx.ui.nabu.casa',
                                                                               'httpPort': 8123,
                                                                               'httpSSL': False,
                                                                               'proxyDeviceId': '40902f43-31b8-4ccc-ae62-026ab8f3dc4d',
                                                                               'uuid': 'b436ccf322f94b788b7e094d302fbdad',
                                                                               'webhookId': '2338d36ab1868711dd75786113c6dd01b5b806a42bda73719309742a20f2fa31'},
                                                                'id': 'light.badkamer'}],
                                                   'execution': [{'command': 'action.devices.commands.OnOff',
                                                                  'params': {'on': True}}]}]}}],
             'requestId': '9451238974011229789'}}

2021-12-28 17:40:55 DEBUG (MainThread) [hass_nabucasa.iot] Publishing message:
{'msgid': '85ac4f5f77390f844344a0a2e5a553cc',
 'payload': {'payload': {'commands': [{'ids': ['light.badkamer'],
                                       'states': {'on': False, 'online': True},
                                       'status': 'SUCCESS'}]},
             'requestId': '9451238974011229789'}}
Bascht74 commented 2 years ago

@balloob Is this the right way to signal these problems to Nabu Casa? So far it seems that nobody from Nabu Casa responded… that is why I ask.

Maybe there is some information missing about how to address these problems or to see the service status of the nabu Casa HA services (like other companys do, e.g. https://status.teamviewer.com/). I don‘t know where to find these…

thx, Sebastian

pergolafabio commented 2 years ago

i also see these warnings when the issue occurs:

image

hugalafutro commented 2 years ago

@Bascht74 you can see status @ https://status.home-assistant.io but that has been always green with no ongoing issues while this is happening.

mikeneiderhauser commented 2 years ago

I filed a NubaCasa support ticket linking to this github issue. Hopefully they have better visibility now.

pergolafabio commented 2 years ago

On what component can i turn on debugging? this one doesnt provide any error: "hass_nabucasa: debug"

mikeneiderhauser commented 2 years ago

Wow. They are fast

Hi Mike​, ​ We are re-writing the relayer which handles the communication between your instance and Amazon/Google and this is expected to be a large improvement over the current implementation. There should be much less latency and CPU usage on the backend and should be up between now and the first month of 2022.

I could have you try deleting hidden files and recreating your exposed devices, etc, but I've found that none of this helps in the long run. It may improve at first but the "device not responding" responses will slowly creep back in. We have simply gotten to a point where we have many more users and the same old resources and need to make changes.

Thanks for your patience!

Regards, Ashton

pergolafabio commented 2 years ago

ok, good to hear

at this moment, its really frustating, it happens to me on 50% of the occasions

hugalafutro commented 2 years ago

That's some good info and news @mikeneiderhauser, just wish it was visible more transparently somewhere before I spent time and time again messing around with everything from my router to buying new devices while trying to debug this, but oh well still glad to hear it should improve soon-ish. My only vindication is my gut feeling it's problem at nabu not at ours ha instances was right 😄

pergolafabio commented 2 years ago

how many times do you experience the issue?

hugalafutro commented 2 years ago

how many times do you experience the issue?

intermittently, usually 90% when it's first time after long time, afterwards 2-3 times out of 10 some days better some days worse

pergolafabio commented 2 years ago

ok, same behaviour as me, especially this week , maybe because of holiday? more users using it

makop79 commented 2 years ago

Same issue with me, glad to hear that they're working on it. Happens 90% of the time when using it after a longer period of time and is really frustrating.

pergolafabio commented 2 years ago

Indeed, this week it's horrible

alexruffell commented 2 years ago

Hi Mike​,​We are re-writing the relayer which handles the communication between your instance and Amazon/Google and this is expected to be a large improvement over the current implementation. There should be much less latency and CPU usage on the backend and should be up between now and the first month of 2022.

I could have you try deleting hidden files and recreating your exposed devices, etc, but I've found that none of this helps in the long run. It may improve at first but the "device not responding" responses will slowly creep back in. We have simply gotten to a point where we have many more users and the same old resources and need to make changes.

Thanks for your patience!

Regards, Ashton

I recently disabled the Alexa integration, the skill, and re-set it up. While I still get the errors, they are less frequent. If this is what he was suggesting, then I'd say I am having the same issue and I am glad to hear the issue will hopefully be resolved soon.

gregsheremeta commented 2 years ago

I hope the rewrite has much better observability. I'd like to see cloud side logs for myself when this incredibly frustrating problem is happening. (last 3 days has been awful)

pergolafabio commented 2 years ago

Same here, verry frustrating

gregsheremeta commented 2 years ago

I switched from Alexa to Google a few months ago because I noticed Google was much more responsive. I also noticed that Alexa failed more frequently when cold - almost felt like an AWS Lambda startup hit or cache warmup or something.

But last three days Google is failing miserably.

kmil4 commented 2 years ago

Just want to echo (sorry) issues other folks are having.

Issues:

What I tried:

  1. Creating a template light to trigger the script, and exposing the light to Alexa. The voice commands to Alexa will toggle the fake light, which activates the script. No change in behavior using this method
  2. Reconfiguring Alexa and HA skill
maglat commented 2 years ago

For me it is very slow and unreliable as well. I can confirm all issues which are described in this thread as well. Hopefully the Nabu servers will be upgraded soon. Its a bit frustrating. Especially for my wife which need to handle our 6 months old daughter. She freaks out when lights wont go on or off by voice command while carrying the baby around.

foxymichelle commented 2 years ago

I look forward to the update! I'm still in my trial month and was debating keeping the service.

My HA response time has been all over the place - from lightning fast to timing out and saying the HA device is unresponsive, yet the device responds shortly after.

Another thing I noticed is that every time I open up my Alexa app, it takes up to 30 seconds or so to "reconnect" with HA. All my devices are labeled unresponsive, then slowly they all come back online and Alexa updates their status. Will your updates fix that?

stackner commented 2 years ago

Same issues here for a few months now. nice to see something should hopefully happen.....

JuergenEhret commented 2 years ago

Me too - Same issue. Looking forward for some improvements ;-)

Turbovix commented 2 years ago

I get the impression that Nabucasa's infrastructure can't handle the demand of requests.

derekoharrow commented 2 years ago

I get the impression that Nabucasa's infrastructure can't handle the demand of requests.

It doesn't sound like that to me. If that were the case then most request would fail. This sounds more like some kind of initial timeout issue as only the first request fails and subsequent requests are fine.

Turbovix commented 2 years ago

Looking at it this way I think you are correct.

Em sex, 7 de jan de 2022 08:43, derekoharrow @.***> escreveu:

I get the impression that Nabucasa's infrastructure can't handle the demand of requests.

It doesn't sound like that to me. If that were the case then most request would fail. This sounds more like some kind of initial timeout issue as only the first request fails and subsequent requests are fine.

— Reply to this email directly, view it on GitHub https://github.com/NabuCasa/hass-nabucasa/issues/298#issuecomment-1007344119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA53TDWJVZTQMFIX5M4XJZLUU3GWRANCNFSM5JUF7CTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

pergolafabio commented 2 years ago

Wow. They are fast

Hi Mike​,​We are re-writing the relayer which handles the communication between your instance and Amazon/Google and this is expected to be a large improvement over the current implementation. There should be much less latency and CPU usage on the backend and should be up between now and the first month of 2022.

I could have you try deleting hidden files and recreating your exposed devices, etc, but I've found that none of this helps in the long run. It may improve at first but the "device not responding" responses will slowly creep back in. We have simply gotten to a point where we have many more users and the same old resources and need to make changes.

Thanks for your patience!

Regards, Ashton

hey, can you maybe ask for an update? thnx

WhimsySpoon commented 2 years ago

I get the impression that Nabucasa's infrastructure can't handle the demand of requests.

It doesn't sound like that to me. If that were the case then most request would fail. This sounds more like some kind of initial timeout issue as only the first request fails and subsequent requests are fine.

Things have become so bad recently that even subsequent requests have started timing out for me.

curt7000 commented 2 years ago

Has anyone from Nabu commented on this yet? I would really like to see the Nabu Casa Team comment. Obviously, it would be good to fix right away, but an explanation of the problem and time to fix is fine too.

I was really discouraged over the holidays, my super fancy HA system takes 5-10s to turn on lights with Alexa, while at my Parents' House, their simple Kasa Smart Plug Skill almost turns on the outlet/light instantaneously!!! Very jealous :(

mikeneiderhauser commented 2 years ago

Wow. They are fast Hi Mike​,​We are re-writing the relayer which handles the communication between your instance and Amazon/Google and this is expected to be a large improvement over the current implementation. There should be much less latency and CPU usage on the backend and should be up between now and the first month of 2022. I could have you try deleting hidden files and recreating your exposed devices, etc, but I've found that none of this helps in the long run. It may improve at first but the "device not responding" responses will slowly creep back in. We have simply gotten to a point where we have many more users and the same old resources and need to make changes. Thanks for your patience! Regards, Ashton

hey, can you maybe ask for an update? thnx

Asked for an update on the ticket and asked if they could post info in this github issue for the community