esphome / esphome-core

🚨 No longer used 🚨 - The C++ framework behind ESPHome
https://esphome.io/
GNU General Public License v3.0
545 stars 113 forks source link

MQTT Publish failed! #122

Closed ezar closed 5 years ago

ezar commented 6 years ago

Fails many times to publish :(

[22:25:58][I][mqtt.client:095]: Connecting to MQTT...
[22:25:58][I][mqtt.client:132]: MQTT Connected!
[22:25:58][C][sensor.adc:034]: Setting up ADC 'ESP32Pool Voltage Battery'...
[22:25:58][C][esphal:053]:     GPIO Pin 34 with mode INPUT
[22:25:58][C][component:239]:     Update interval: 15000ms
[22:25:58][C][component:239]:     Update interval: 15000ms
[22:25:58][C][sensor.dallas:042]: Setting up DallasComponent...
[22:25:58][C][sensor.dallas:043]:     Want device count: 1
[22:25:58][D][sensor.dallas:050]: Found sensors:
[22:25:58][D][sensor.dallas:065]:     0x000416849074FF28
[22:25:58][C][sensor.dallas:069]: Device 'ESP32Pool Temperature':
[22:25:58][C][sensor.dallas:080]:     Address: 0x000416849074FF28
[22:25:58][C][sensor.dallas:082]:     Resolution: 12
[22:25:58][C][component:239]:     Update interval: 15000ms
[22:25:58][C][sensor.mqtt:025]: Setting up MQTT Sensor 'ESP32Pool Temperature'...
[22:25:58][C][sensor.mqtt:027]:     Expire After: 675s
[22:25:58][C][sensor.mqtt:029]:     Unit of Measurement: '°C'
[22:25:58][C][sensor.mqtt:030]:     Accuracy Decimals: 1
[22:25:58][C][sensor.mqtt:031]:     Icon: ''
[22:25:58][C][sensor.mqtt:033]:     Unique ID: 'dallas-000416849074FF28'
[22:25:58][C][sensor.mqtt:025]: Setting up MQTT Sensor 'ESP32Pool Voltage Battery'...
[22:25:58][C][sensor.mqtt:027]:     Expire After: 675s
[22:25:58][C][sensor.mqtt:029]:     Unit of Measurement: 'V'
[22:25:58][C][sensor.mqtt:030]:     Accuracy Decimals: 2
[22:25:58][C][sensor.mqtt:031]:     Icon: 'mdi:flash'
[22:25:58][C][sensor.mqtt:025]: Setting up MQTT Sensor 'ESP32Pool WiFi Signal Sensor'...
[22:25:58][C][sensor.mqtt:027]:     Expire After: 675s
[22:25:58][C][sensor.mqtt:029]:     Unit of Measurement: 'dB'
[22:25:58][C][sensor.mqtt:030]:     Accuracy Decimals: 0
[22:25:58][C][sensor.mqtt:031]:     Icon: 'mdi:wifi'
[22:25:58][C][sensor.mqtt:033]:     Unique ID: '30aea41c3ff4-wifisignal'
[22:25:58][C][binary_sensor.mqtt:024]: Setting up MQTT binary sensor 'ESP32Pool Status'...
[22:25:58][C][binary_sensor.mqtt:026]:     Device Class: 'connectivity'
[22:25:58][C][switch.mqtt:027]: Setting up MQTT switch 'ESP32Pool Shutdown'
[22:25:58][C][switch.mqtt:028]:     Icon: 'mdi:power'
[22:25:58][D][mqtt.client:173]: Subscribing to topic='esp32pool/switch/esp32pool_shutdown/command' qos=0...
[22:25:58][D][switch.mqtt:071]: 'ESP32Pool Shutdown': Sending state OFF
[22:25:58][C][switch.mqtt:027]: Setting up MQTT switch 'ESP32Pool Restart'
[22:25:58][C][switch.mqtt:028]:     Icon: 'mdi:restart'
[22:25:58][D][mqtt.client:173]: Subscribing to topic='esp32pool/switch/esp32pool_restart/command' qos=0...
[22:25:58][D][switch.mqtt:071]: 'ESP32Pool Restart': Sending state OFF
[22:25:58][C][deep_sleep:022]: Setting up Deep Sleep...
[22:25:58][C][deep_sleep:024]:   Sleep Duration: 1800000 ms
[22:25:58][C][deep_sleep:026]:   Run Duration: 5000 ms
[22:25:58][I][application:090]: Running through first loop()
[22:25:58][D][sensor.adc:077]: 'ESP32Pool Voltage Battery': Got voltage=3.90V
[22:25:58][D][sensor.mqtt:091]: 'ESP32Pool Voltage Battery': Pushing out value 3.900000 with 2 decimals of accuracy
[22:25:58][W][mqtt.client:213]: Publish failed!
[22:25:58][D][sensor.mqtt:091]: 'ESP32Pool WiFi Signal Sensor': Pushing out value -68.000000 with 0 decimals of accuracy
[22:25:58][I][application:106]: First loop finished successfully!
[22:25:59][D][sensor.dallas:141]: 'ESP32Pool Temperature': Got Temperature=30.1°C
[22:25:59][D][sensor.mqtt:091]: 'ESP32Pool Temperature': Pushing out value 30.062500 with 1 decimals of accuracy
[22:25:59][W][mqtt.client:213]: Publish failed!
[22:26:03][I][deep_sleep:081]: Beginning Deep Sleep
OttoWinter commented 6 years ago

Ok, so Publish failed! usually indicates that the internal TCP buffer is filling up and so MQTT messages cannot be published until the previous messages have been dealt with. Due to that reason the error message is also quite brief because it increases the chances of it still fitting into the buffer, but also by sacrificing context.

Ok, so for fixing this warning I would recommend decreasing the log level so that fewer MQTT messages are published and thus the TCP buffer doesn't fill up as quick.

tringler commented 5 years ago

Hello @OttoWinter,

I have the same problem as described in the first posts, but for me it is not warning, because the mqtt messages are never reaching my server.

I set the loglevel to NONE, but not only one mqtt message has ever reached my mqtt server, but if I set the loglevel to VERBOSE it is working like a charm, which is the opposite of that what you mentioned.

OttoWinter commented 5 years ago

@tringler Well that would be because if you set the log level to NONE no log messages will be produced. So you will never see log messages. However, when the log level is NONE all other (more important) messages like sensors announcing their state have a higher probability of being transmitted correctly.

tringler commented 5 years ago

Maybe I understand it wrongly, but just to make sure:

Loglevel NONE: No mqtt messages are received by my mqtt server Loglevel VERBOSE: All mqtt messages are received by my mqtt server

This is expected? I understand the issue as it should exactly the opposite.

OttoWinter commented 5 years ago

Please specify what you mean by "no mqtt messages" and "all mqtt messages".

What I meant was that if you set the log level to NONE, you shouldn't see any MQTT messages on the topic <NODE_NAME>/debug, as no log messages are ever produced at that log level. However, all other MQTT messages (for example sensors publishing their state on <NODE_NAME>/sensor/.../state), should be sent correctly. Are really no MQTT messages (even sensor state messages) sent with NONE log level?

(If really no messages whatsoever are sent, then that could be because of the MQTT server aborting the connection due to inactivity, the keepalive parameter for MQTT might help in that case)

tringler commented 5 years ago

Are really no MQTT messages (even sensor state messages) sent with NONE log level?

Indeed.

I'm not talking about the debug output. At that moment I set the loglevel to NONE I see the Published failed! messages in the debug log and no messages are transferred to my mqtt server. I see the HomeAssistant state as unavailable then.

At that moment I set the loglevel to VERBOSE I see no Published failed anymore and get the right state in HomeAssistant.

OttoWinter commented 5 years ago

Ok then, so it is really the NONE log level. Unfortunately, after looking at the code again, I can't figure out what it could be. In fact, I just tested it myself using the NONE log level and it is working as expected.

At that moment I set the loglevel to NONE I see the Published failed! messages in the debug log

This is weird, when you set the log level to NONE you shouldn't see publish failedat all, as publish failed is a log message and is not transmitted with log level NONE.

tringler commented 5 years ago

You're right. Let me reproduce this again in the next days. It is a few days ago and maybe a bit wrong in my mind. I will provide a list of the behaviour of the different log levels on my mqtt server and what I get from the debug esphomelib logging.

ghost commented 5 years ago

So I'm seeing many "Publish failed" too. Lots of logs here: https://gist.github.com/thubot/d770f6a14676de5f8e51d2e2228f577b This is my yaml: https://gist.github.com/thubot/ea5c523f0ce3393a8db7c3320aef8e8e

The problem is persistent identically over reboots of the same flash, but differs per esphomeyaml run of same config (sometimes the adc config message fails, sometimes only the temperature config message, sometimes both, notably both long HA discovery messages)

In my test runs, the problem did only occur on the INFO DEBUG log level and above, not on VERBOSE or VERY_VERBOSE. I'm not sure if this is significant or just a coincidence.

The only part of the config messages that changes over reruns is the version string, so that could be the only thing that might change the length of the message, but it seems to be constant length.

Does setting VERBOSE or VERY_VERBOSE raise the buffer size somehow?

HFsi commented 5 years ago

I have the same problem....I only have 2 sensors with 1 message/sec each, so I don't think It's buffer related. And in my case, the Publish failed! msg comes after MQTT Connected, so I suppose the buffer doesn't even have time to fill up. In the meantime, I added the sensors manually in configuration.yaml to view its info, but the node doesn't read mqtt messages (eg, the one I did to no enter in deep sleep). Can I help with some test or info?

OttoWinter commented 5 years ago

@thubot The severity level does not change the allocated buffer size. It does however send out more messages which I guess could end up improving the probability that more important messages come through.

@HFsi I recently did some improvements for 1.10.0 concerning this issue. In 1.10.0, the messages will be auto-republished at a later point if something goes wrong. You can test the new system using (might not be 100% stable, but please do report bugs)

esphomeyaml:
  esphomelib_version:
    branch: custom-sensor-platform
HFsi commented 5 years ago

I think I found my problem....If I define:

mqtt: birth_message: will_message:

For retain the messages, the node doesn´t update the sensors anymore. I´m on 1.9.3, and without birth and will, works fine.

jladefoged commented 5 years ago

I get the same error on 1 sensor on 1 board. If I set logger to very_verbose I still get the publish failed error but the missing sensor shows up in HA.

markusressel commented 5 years ago

@HFsi I recently did some improvements for 1.10.0 concerning this issue. In 1.10.0, the messages will be auto-republished at a later point if something goes wrong. You can test the new system using (might not be 100% stable, but please do report bugs)

esphomeyaml:
  esphomelib_version:
    branch: custom-sensor-platform

I think the main problem is that the device tries to send a lot of messages right after boot especially if you use home assistant discovery. This might cause the problem of filling the TCP buffer. The biggest problem however is that the messages that fail most often are the discovery messages and when they are failing Home Assistant wont even show the device in the GUI and also won't listen to any following messages.

Sadly thg custom-sensor-platform branch doesn't seem to exist anymore. I tried the dev branch which prints the message "[...] will retry later.." - but how much later exactly? I did wait 10+ minutes or so and didn't see any retry message in the (DEBUG) log.

OttoWinter commented 5 years ago

@markusressel Yes that branch has been merged. will retry later is good, unless you set the log level to verbose you won't see when the message is actually published successfully. But if it would fail repeatedly, you would see a lot of continuous will retry later messages.

DavidDeSloovere commented 5 years ago

I'm also seeing the "Publish failed!", mainly on the discovery data. Spent a lot of time thinking it was something with the broker, swapping it our and changing stuff.

Biggest issue with this, like posted above, is that it's the publish of the discovery data that was failing. The status binary_sensor got through, but the light didn't. It's probably not random what messages fail, or rather which message in the queue fails.

After a short night with too little sleep: commenting status binary_sensor, status led and web server, discovery data got through.

I don't know if it's possible to correlate the "Publish failed!" and the message itself? Knowing that all logs are also going over mqtt, it might only add more bytes to the TCP buffer.

When the TCP buffer gets full, it's because of too many messages, but what is the underlying issue here?

Anyway, thanks again @OttoWinter for all your hard work. Looking forward to improvements.

[07:52:58][I][mqtt.client:115]: MQTT Connected!
[07:52:59][V][mqtt.client:239]: Publish(topic='wemos_bandit_scrappy/status' payload='online' retain=1)
[07:52:59][V][mqtt.component:067]: 'bandit_scrappy status': Sending discovery...
[07:52:59][V][mqtt.client:239]: Publish(topic='homeassistant/binary_sensor/wemos_bandit_scrappy/bandit_scrappy_status/config' payload='{"device_class":"connectivity","payload_on":"online","payload_off":"offline","name":"bandit_scrappy status","state_topic":"wemos_bandit_scrappy/status","unique_id":"ESPbinary_sensorbandit_scrappy_status","device":{"identifiers":"84f3eb9fc081","name":"wemos_bandit_scrappy","sw_version":"esphomelib v1.9.3 Dec 22 2018, 07:49:35","model":"PLATFORMIO_D1_MINI","manufacturer":"espres
[07:52:59][W][mqtt.client:249]: Publish failed!
[07:52:59][V][mqtt.client:214]: Subscribing to topic='wemos_bandit_scrappy/light/bandit_scrappy_one/command' qos=0 with JSON...
[07:52:59][W][mqtt.client:249]: Publish failed!
[07:52:59][W][mqtt.client:249]: Publish failed!
[07:52:59][V][mqtt.client:239]: Publish(topic='wemos_bandit_scrappy/light/bandit_scrappy_one/state' payload='{"effect":"None","state":"ON","brightness":255,"color":{"r":255,"g":36,"b":36}}' retain=1)
OttoWinter commented 5 years ago

@DavidDeSloovere Try with latest dev branch - that has fixed MQTT discovery publishing.

The issue is device-related. The ESP has a fixed-size TCP buffer (can be changed by some platformio commands I think). Once it's full, any data that's written in there will be lost - that's why you see publish failed.

Anyway, most of this will be solved anyway with the next release. The new native HA<->esphome API is much more efficient with data size.

markusressel commented 5 years ago

I'm now running version 1.10.1 with the native api via Home Assistant and so far it works great. Apart from some setup issues (installing the new firmware and hooking it up to home assistant) I had no issues whatsoever getting it to run on first try.

So: damn great work there @OttoWinter!