DNS flood requests might be causing some issues on HA

estebanz01 commented 4 years ago

Hello!

I'm posting this info here, since it seems to be an issue with the tuya api and how the requests are made to tuya servers.

After some versions, some people are starting to see an increased number of DNS requests via IPV6 (as reported https://github.com/home-assistant/core/issues/36713 and https://github.com/home-assistant/core/issues/26855), and I believe this is causing some issues that have been reported in HA (https://github.com/home-assistant/core/issues/36744) which makes the devices unusable because of a max retries request error.

Jun 15 01:43:55 raspberrypi hass[4833]: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='px1.tuyaus.com', port=443): Max retries exceeded with url: /homeassistant/skill (
Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xa0314f70>: Failed to establish a new connection: [Errno 110] Connection timed out'))                   
Jun 15 01:43:55 raspberrypi hass[4833]: During handling of the above exception, another exception occurred:                                                                          
Jun 15 01:43:55 raspberrypi hass[4833]: Traceback (most recent call last):                                                                                                           
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/helpers/entity.py", line 279, in async_update_ha_state                  
Jun 15 01:43:55 raspberrypi hass[4833]:     await self.async_device_update()                                                                                                         
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/helpers/entity.py", line 472, in async_device_update                    
Jun 15 01:43:55 raspberrypi hass[4833]:     await self.hass.async_add_executor_job(self.update)                                                                                      
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run                                                                    
Jun 15 01:43:55 raspberrypi hass[4833]:     result = self.fn(*self.args, **self.kwargs)                                                                                              
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/tuya/__init__.py", line 254, in update                       
Jun 15 01:43:55 raspberrypi hass[4833]:     self._tuya.update()                                                                                                                      
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/tuyaha/devices/switch.py", line 23, in update                                         
Jun 15 01:43:55 raspberrypi hass[4833]:     devices = self.api.discovery()                                                                                                           
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/tuyaha/tuyaapi.py", line 115, in discovery                                            
Jun 15 01:43:55 raspberrypi hass[4833]:     response = self._request("Discovery", "discovery")                                                                                       
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/tuyaha/tuyaapi.py", line 161, in _request                                             
Jun 15 01:43:55 raspberrypi hass[4833]:     (TUYACLOUDURL + "/homeassistant/skill").format(SESSION.region), json=data                                                                
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/requests/api.py", line 119, in post                                                  
Jun 15 01:43:55 raspberrypi hass[4833]:     return request('post', url, data=data, json=json, **kwargs)                                                                             
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/requests/api.py", line 61, in request 
Jun 15 01:43:55 raspberrypi hass[4833]:     return session.request(method=method, url=url, **kwargs)
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
Jun 15 01:43:55 raspberrypi hass[4833]:     resp = self.send(prep, **send_kwargs)
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
Jun 15 01:43:55 raspberrypi hass[4833]:     r = adapter.send(request, **kwargs)
Jun 15 01:43:55 raspberrypi hass[4833]:   File "/srv/homeassistant/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
Jun 15 01:43:55 raspberrypi hass[4833]:     raise ConnectionError(e, request=request)
Jun 15 01:43:55 raspberrypi hass[4833]: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='px1.tuyaus.com', port=443): Max retries exceeded with url: /homeassistant/skil
l (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xa0314f70>: Failed to establish a new connection: [Errno 110] Connection timed out'))

So, after a suggestion made here: https://github.com/home-assistant/core/issues/26855#issuecomment-609424984 I did the following monkey patch in /srv/homeassistant/lib/python3.7/site-packages/tuyaha/tuyaapi.py:

import socket                                                                                                                                                                        
import requests.packages.urllib3.util.connection as urllib3_cn                                                                                                                       

def allowed_gai_family():                                                                                                                                                            
    family = socket.AF_INET                                                                                                                                                          
    return family                                                                                                                                                                    

urllib3_cn.allowed_gai_family = allowed_gai_family

right after _LOGGER. It seems to be working, the DNS flood on the IPV6 land stopped and now I can control most of my tuya devices without problems on HA.

I'm not sure where should I start taking a look for a proper fix, but I'm open to suggestions. Also, if the monkey patch is accepted as a solution, I can open a PR without problem.

djtimca commented 4 years ago

If it solves the problem I'm all for it as a solution.

djtimca commented 4 years ago

Are you planning to open PR with patch?

estebanz01 commented 4 years ago

I'm waiting for @PaulAnnekov feedback on this, but I can open it.

djtimca commented 4 years ago

Given the fact that this issue is significant, even if this isn't the long term fix I think it is worth it. It has been 8 days without feedback - just hoping @PaulAnnekov can review and push the update without us having to fork.

gomble commented 4 years ago

My Tuya integration broke today_

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='px1.tuyaeu.com', port=443): Max retries exceeded with url: /homeassistant/skill (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f88e9ae3210>: Failed to establish a new connection: [Errno -3] Try again'))

PaulAnnekov commented 4 years ago

As I understand the reason of so large amount of DNS requests is in the following steps:

HA polls each tuya device class defined in this library
This library makes an HTTP request to px1.tuyaeu.com for each device
During HTTP request urllib3 makes DNS request for A and AAAA records
Sleep 30 seconds, go to 1.

And you suggest to fix this by disabling querying for AAAA records. But it won't stop flooding. It will just reduce amount of requests by 2 times. I don't think it's a good solution.

What solutions do I see:

Is it really a problem for your DNS server to process, e.g. 20 requests (if you have 10 devices) each 30 seconds?
If it is, why not to enable OS or docker DNS cache?
If it's still hard for you, I see only one correct way to fix this. We should switch from polling to pushing. Make tuya component pushing and make an HTTP request to Discovery like here https://github.com/PaulAnnekov/tuyaha/blob/8bf61ce84ed691e9fab2c49d90257c5d931aa461/tuyaha/tuyaapi.py#L115 which will get states of all devices via single request. Then update all devices. It will decrease amount of requests to 2 despite of amount of devices.

Regarding exception: Sometimes I see this exception too. But I don't think it's related to DNS requests. I think tuya API server is unstable or unavailable sometimes or refuses new connections when we make a lot of requests to it. That's why you see connection failed. If it refuses connections because a lot of requests, then maybe we can check if keep-alive is working or try 3rd solution above.

djtimca commented 4 years ago

@PaulAnnekov it sounds like the push approach would work with Discovery instead - that should drastically reduce calls. Is your suggestion to pull a local copy of this repo and make the change locally or are you thinking you will push an update to this repo?

PaulAnnekov commented 4 years ago

Ok, I think we should start fixing this issue by using Session object, because when you use global methods (get, post, ...) of requests lib, it will always make a DNS request and create new connection. Using Session we can at least reuse TCP connection and, probably, reduce amount of DNS requests. Also, it's an easy fix. I'm waiting for a PR or will do it myself in a week.

estebanz01 commented 4 years ago

As I understand the reason of so large amount of DNS requests is in the following steps:
1. HA polls each tuya device class defined in this library

2. This library makes an HTTP request to `px1.tuyaeu.com` for each device

3. During HTTP request urllib3 makes DNS request for A and AAAA records

4. Sleep 30 seconds, go to 1.
And you suggest to fix this by disabling querying for AAAA records. But it won't stop flooding. It will just reduce amount of requests by 2 times. I don't think it's a good solution.

What solutions do I see:
1. Is it really a problem for your DNS server to process, e.g. 20 requests (if you have 10 devices) each 30 seconds?

2. If it is, why not to enable OS or docker DNS cache?

3. If it's still hard for you, I see only one correct way to fix this. We should switch from polling to pushing. Make `tuya` component pushing and make an HTTP request to `Discovery` like here https://github.com/PaulAnnekov/tuyaha/blob/8bf61ce84ed691e9fab2c49d90257c5d931aa461/tuyaha/tuyaapi.py#L115
    which will get states of all devices via single request. Then update all devices. It will decrease amount of requests to 2 despite of amount of devices.
Regarding exception: Sometimes I see this exception too. But I don't think it's related to DNS requests. I think tuya API server is unstable or unavailable sometimes or refuses new connections when we make a lot of requests to it. That's why you see connection failed. If it refuses connections because a lot of requests, then maybe we can check if keep-alive is working or try 3rd solution above.

Thanks for this excellent explanation ! I also agree we might want to do push instead of dealing or relying on DNS or OS configurations. I just proposed a band-aid fix for the mean time. I'll try to give it a shot to use Session as well to see what happens

tomvghs commented 4 years ago

I have the same issue. Adguard DNS reports over 2 million DNS requests to tuya from Hass server in less than 30days:

Hass Tuya

Is there a fix for this issue?

tomvghs commented 4 years ago

Anyone on how to fix this? Is there an update?

PaulAnnekov / tuyaha

DNS flood requests might be causing some issues on HA #36