TheAgentK / tuya-mqtt

Nodejs-Script to combine tuyaapi and openhab via mqtt
MIT License
173 stars 81 forks source link

MQTT disconnect loop if the device name contains '+' or '#' characters #48

Closed mbronk closed 3 years ago

mbronk commented 3 years ago

Description When the devices.conf contains a device name with characters not legal in a Mqtt topic name, e.g. name: 'T2 candle 5W RGB+CW' -> upon first attempt of publishing data to device's topic, the script enters endless reconnect loop it can not recover from. Broker-side (Mosquitto) this is seen as a connection-->socket error-->connection-->...

Notes:

  1. This happens when using either '+' or '#' in the device name, though if a '/' is used in the name - it also gets passed into Mqtt, creating unnecessary subtopics.
  2. The name T2 candle 5W RGB+CW is a real-world default device name of a lightbulb I have (generated by Tuya app). It was copy-pasted from tuya-cli wizard

To Reproduce Originally observed when using Mosquitto as a broker (did not check behavior of other brokers)

  1. Edit devices.json and put a device with a name containing either a '+' or a # - e.g. name: 'T2 candle 5W RGB+CW'
  2. Run DEBUG=tuya-mqtt:* node tuya-mqtt.js
  3. Observe (logs) that the MQTT connection is established first, but the moment there is an attempt to publish anything on the device's channel (technically: when [devices/tuya-device.js::626] this.mqttClient.publish(topic, message, { qos: 1 }) is hit) - the script enters an endless reconnect loop, with 'reconnect' event firing every second.
  4. Effectively - no data gets published to the MQTT server

Expected behavior

  1. Device name should be canonized, not to include any of # + / It might suffice to extend the filter here and update the readme (unfortunately don't have time today, to submit a PR for this myself, though it should be fairly trivial)
  2. Additionally - If such class of errors happens - logging should be expanded to indicate the origin (hopefully mqtt module allows for catching these errors). While it seems trivial now, without looking at source code it was impossible for me to figure out what is happening.

System info/logs

Click to expand! - **OS:** Ubuntu 20.04.1 LTS, ARM - **MQTT Broker:** mosquitto 1.6.9-1 (MQTT version 5.0/3.1.1/3.1 compatible message broker) - Packages: ``` $>npm list --depth=0 tuya-mqtt@3.0.0 ~/tuya-mqtt ├── @tuyapi/cli@1.13.4 ├── color-convert@2.0.1 ├── debug@4.2.0 ├── json5@2.1.3 ├── mathjs@7.5.1 ├── mqtt@4.2.1 ├── supports-color@7.2.0 └── tuyapi@6.0.1 ``` - Log exceprt (from `DEBUG=tuya-mqtt:* node tuya-mqtt.js` - not including output of DEBUG=* as it is massive... and frankly does not help too much other than explicitly showing when the disconnect happens :) ) ``` tuya-mqtt:info Connection established to MQTT server +0ms tuya-mqtt:tuyapi Search for device id 00000000deadbeef0000 +0ms tuya-mqtt:tuyapi Found device id 00000000deadbeef0000 +9ms tuya-mqtt:tuyapi Received JSON data from device 00000000deadbeef0000 -> (... device communication omitted for brevity ...) tuya-mqtt:discovery Home Assistant config topic: homeassistant/light/00000000deadbeef0000/config +0ms tuya-mqtt:discovery { name: 'T2 candle 5W RGB+CW', tuya-mqtt:discovery state_topic: 'tuya/t2_candle_5w_rgb+cw/state', (... cut ...) tuya-mqtt:discovery max_mireds: 400 } +2ms tuya-mqtt:info Unable to connect to MQTT server +2s tuya-mqtt:tuyapi Received JSON data from device 00000000deadbeef0000 -> {"20":true,"21":"white" ... (... a few of these follow ...) tuya-mqtt:state tuya/t2_candle_5w_rgb+cw/state ON +0ms (... rest of tuya-mqtt:state omitted ...) tuya-mqtt:info Unable to connect to MQTT server +1s tuya-mqtt:info Unable to connect to MQTT server +1s tuya-mqtt:info Unable to connect to MQTT server +1s tuya-mqtt:info Unable to connect to MQTT server +1s tuya-mqtt:info Unable to connect to MQTT server +1s tuya-mqtt:info Unable to connect to MQTT server +1s (...) ``` - From `/var/log/mosquitto/mosquitto.log`: ``` 1604644221: New connection from 127.0.0.1 on port 1883. 1604644221: New client connected from 127.0.0.1 as mqttjs_4ca70238 (p2, c1, k60). 1604644221: Socket error on client mqttjs_4ca70238, disconnecting. 1604644222: New connection from 127.0.0.1 on port 1883. 1604644222: New client connected from 127.0.0.1 as mqttjs_4ca70238 (p2, c1, k60). 1604644222: Socket error on client mqttjs_4ca70238, disconnecting. 1604644223: New connection from 127.0.0.1 on port 1883. 1604644223: New client connected from 127.0.0.1 as mqttjs_4ca70238 (p2, c1, k60). 1604644223: Socket error on client mqttjs_4ca70238, disconnecting. 1604644224: New connection from 127.0.0.1 on port 1883. 1604644224: New client connected from 127.0.0.1 as mqttjs_4ca70238 (p2, c1, k60). 1604644224: Socket error on client mqttjs_4ca70238, disconnecting. (...) ```
tsightler commented 3 years ago

Thanks for the report. I'll try to at least update the docs to tell people not to use names with + or # in them for now and maybe fix the filter.

However, regarding point #2, while you are correct that it might be nice to log for such things, there's probably 100's of those types of cases in the code where it would be nice to log some stuff but where the current logging is insufficient, however, I don't have a lot of time to maintain this project and adding usefl logging to cover all failure cases is a time consuming task, I'll probably just try to implement the minimal fix for this and move on, but PR requests are welcome. Perhaps in this case I can just log the disconnect reason returned, but about 95% of the time that information isn't terribly useful either.

mbronk commented 3 years ago

Thanks @tsightler. I fully agree the logging aspect is a nice to have (and this particular issue is unlikely to be the only one, or the important one) In fact, I just meant to highlight this as an 'area', in case there are future refactorings in this region. Consider it more of a subjective observation than anything else - i.e. when this classes of issues happen... they are not so easy to trace (*at least for me) using existing logs. It took me a good 2-3 hours to figure out and I was close to attaching a debugger, in fact :). But then again, I am not using MQTT daily, so I guess for someone seasoned with it it might have been a breeze...

mbronk commented 3 years ago

Re this:

I'll try to at least update the docs to tell people not to use names with + or # in them for now and maybe fix the filter.

Consider cautioning users about / in the name as well. While it does work... it creates unnecessary subtopics, so I could envision it breaking some more elaborate usages(?)

tsightler commented 3 years ago

Yeah, my current thought is just to filter out all invalid characters for topic names, but the main question is do I replace them with underscores or just filter them out completely. I'm not sure which is best, but I'm thinking completely remove them. I guess it probably doesn't matter too much as long as I document it.

I'm going to test to see if I can trap the reason for disconnect from MQTT and log that. I have a small bundle of minor updates and fixes at this point so hopefully I'll get a 3.0.1 published this weekend to address this.

Thanks for the report!

tsightler commented 3 years ago

Out of curiosity, did you run with enabling debugging of MQTTjs? I mean, I document how to enable debugging for mqtt-tuya, but you can also run DEBUG='mqttjs' (or even DEBUG=, but that's a LOT of debug) and the MQTTjs library will give you much more information about any MQTT specific issues. I'm guessing it would have probably caught this pretty clearly but then again I haven't tried it.

mbronk commented 3 years ago

Yes, I did do a DEBUG=* at one point... Which obviously created a ton of logs... yet as far as I recall, other than the connection mysteriously terminating.... there wasn't much actionable output there (or I missed it). What it did help with, was it pinpointed that the reconnects start once the 'publish' runs

tsightler commented 3 years ago

This is hopefully fixed in the latest push so closing this for now.