re-connect at every mqtt command

m-reuter commented 2 years ago

Hi,

thanks for the great work! I noticed that my var log message file gets spammed with MQTT disconnect and new connect messages. The reason is in this function:

https://github.com/Spanni26/pyHPSU/blob/e7a2b460ef12ca509957a42c4853375211f64bb1/HPSU/plugins/mqtt.py#L91

which connects for every single command, and I am querying the HPSU every few seconds with multiple commands. Shouldn't the connection be established one time in the beginning and then remains active? That is how my other devices behave.

segaura commented 2 years ago

There are two different things:

the opportunity to reuse the same mqtt connection over and over, which can be discussed with its pros and cons
the fact that this kind of events are logged while, maybe, you are not interested in them

I pushed a version which includes log level management, these connection related events are logged at INFO level so if you set minimum logged level as WARNING, they do not reach your log file. I personally am doing exactly this with the --log_level WARNING option.

Now you can find it in the testing branch or in my fork ...problem is that I made so many changes you have to change a few things if you want to use those versions.

Spanni26 commented 2 years ago

If somebody could test the testing branch and it works....we can merge it (or I can even try it)

Am 22.01.2022 um 16:30 schrieb segaura:

There are two different things:

the opportunity to reuse the same mqtt connection over and over, which can be discussed with its pros and cons

the fact that this kind of events are logged while, maybe, you are not interested in them

I pushed a version which includes log level management https://github.com/Spanni26/pyHPSU/blob/225d61201f9521c03256f02bfee347b3a07f5036/HPSU/plugins/mqtt.py, these connection related events are logged at INFO level so if you set minimum logged level as WARNING, they do not reach your log file.

Now you can find it in the testing branch https://github.com/Spanni26/pyHPSU/tree/testing or in my fork https://github.com/segaura/pyHPSU ...problem is that I made so many changes you have to change a few things if you want to use those versions.

— Reply to this email directly, view it on GitHub https://github.com/Spanni26/pyHPSU/issues/50#issuecomment-1019292547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2I5HHAAEZOWBEYF5HXPCLUXLESXANCNFSM5LQOXT5Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

m-reuter commented 2 years ago

thanks. I was not worried about the pyhpsu logs. What was problematic is the MQTT broker (mosquitto) as a Home Assistant Add-on by default logging all those details to syslog. I figured out how to shut off most messages there, but the problem are these error messages that get generated when this client re-connects without closing the connection before: Feb 20 12:28:08 raspberrypi d44fe278ca71[844]: 1645359978: Client rotex_hpsu already connected, closing old connection.

So I think we should either constantly close and re-connect or better:

re-use the existing connection, check if it is lost, and only then re-connect.

segaura commented 2 years ago

Oh, only now I understand, sorry. MQTT output plugin code can be easily refactored to mitigate this issue. As pointed out, this is the problematic section

    def pushValues(self, vars=None):

        #self.msgs=[]
        for r in vars:
            self.client.connect(self.brokerhost, port=self.brokerport)
            msgs=[]
            if self.prefix:
                ret=self.client.publish(self.prefix + "/" + r['name'],payload=r['resp'], qos=int(self.qos))
                topic=self.prefix + "/" + r['name']
            else:
                ret=self.client.publish(r['name'],payload=r['resp'], qos=int(self.qos))
                topic=r['name']
            msg={'topic':topic,'payload':r['resp'], 'qos':self.qos, 'retain':False}
            self.client.disconnect()

A first improvement can be moving connect and disconnect outside of the loop, this way

    def pushValues(self, vars=None):

        self.client.connect(self.brokerhost, port=self.brokerport)

        #self.msgs=[]
        for r in vars:
            msgs=[]
            if self.prefix:
                ret=self.client.publish(self.prefix + "/" + r['name'],payload=r['resp'], qos=int(self.qos))
                topic=self.prefix + "/" + r['name']
            else:
                ret=self.client.publish(r['name'],payload=r['resp'], qos=int(self.qos))
                topic=r['name']
            msg={'topic':topic,'payload':r['resp'], 'qos':self.qos, 'retain':False}

        self.client.disconnect()

If you execute, let's say, 20 read command every 30 seconds, you now create a connection every 30 seconds, before 20 consecutive connections were created every time and maybe some race conditions happened.

I tried this change in my fork and the log messages are gone (example log grep inside my mosquitto container)

/ $  grep "already connected, closing old connection" /mosquitto/log/mosquitto.log
/ $

and a typical log block is like this

0: New client connected from 192.168.1.102:58009 as pyhpsu-24468 (p2, c1, k60, u'dummy-username').
0: Sending CONNACK to pyhpsu-24468 (0, 0)
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hs', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hs_set', ... (4 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/water_pressure', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_dhw', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_dhw_set', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_return', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/flow_rate', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hc', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_hc_set', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/posmix', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/mode_01', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/tvbh2', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/mode', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/pump', ... (1 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/bpv', ... (3 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_v1', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_vbh', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_r1', ... (5 bytes))
0: Received PUBLISH from pyhpsu-24468 (d0, q0, r0, m0, 'rotex/t_flow_cooling', ... (4 bytes))
0: Client pyhpsu-24468 closed its connection.

m-reuter commented 2 years ago

Thanks, this should already help. I am currently pulling #44 . So would you add this to testing? When is the new MQTT stuff going to be merged from testing into master? I have been using that for a year now without problems. See https://github.com/m-reuter/ha-addons/tree/master/pyhpsu2mqtt

segaura commented 2 years ago

I can make a new pull request with the few commits I've added, including this last one we discussed here.

What I'm trying to add before creating it is compatibility of the mqttdaemon mode with systemd, managing correctly exeptions (e.g. closing the father process when a child thread fails so that systemd daemon can understand the problem): are you experiencing similar issue leaving pyHPSU running for days?

m-reuter commented 2 years ago

I did not notice anything (in var/log/messages) . Should I look at another specific log file? I leave it running for months.

segaura commented 2 years ago

OK, in this case we can consolidate the current codebase. I created pull request #51 , still on testing branch.

My issues are related to influxdb output plugin reporting influxdb is sometimes not available and this makes pyHPSU crash in a way that systemd is not able to detect, I will investigate further on my own, thank you.

Spanni26 commented 2 years ago

merged :)

Thanks for your work.

Regards

Daniel

Am 22.02.2022 um 22:36 schrieb segaura:

OK, in this case we can consolidate the current codebase. I created pull request #51 https://github.com/Spanni26/pyHPSU/pull/51 , still on |testing| branch.

My issues are related to influxdb output plugin reporting influxdb is sometimes not available and this makes pyHPSU crash in a way that systemd is not able to detect, I will investigate further on my own, thank you.

— Reply to this email directly, view it on GitHub https://github.com/Spanni26/pyHPSU/issues/50#issuecomment-1048234851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2I5HAHCM7HNUIXBZBWXXTU4P6VLANCNFSM5LQOXT5Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

m-reuter commented 2 years ago

@segaura I upgraded to this version #51 and see constant crashing and restarting of my docker container. Not sure why. I am currently reverting to #44 which worked before, to see if it is a build issue (I also switched to a newer build environment on travis) or if it is caused by the difference between those two versions here.

m-reuter commented 2 years ago

Its also happening with the old version now, so it is probably a build or dependency issue on my side.

m-reuter commented 2 years ago

Turned out to be a dependency python-can==4.0.0 does not seem to work (see #52 ). Forcing the older version 3.3.4 it is now running - so far without problems. The re-connect race conditions did not appear again. Will keep it running for a few days and if all goes well, we can close this and can also consider merging testing into master branch, I think.

m-reuter commented 2 years ago

It is still running stable without any more MQTT re-connects or crashes.

segaura commented 2 years ago

Cool. After you told me that it is running without crashing for months, I looked deeper at my logs and found that the issue is with a mqtt-UNrelated plugin, the InfluxDB one. Possibly one you are not using. Problem is that is tries to catch and manage exceptions but sometimes a second exception happens while the first one is being managed. Simply removing exception management solved the issue (probably in case of exception systemd is restarting the process). Other output plugins like the mqtt one do not manage exceptions. I'll test it a few weeks and then I'll contribute a new pull request with this InfluxDB plugin fix.

m-reuter commented 2 years ago

@segaura great. And you are right, I am not using InfluxDB plugin.

svde commented 2 years ago

After my server running mosquitto reboots for software updates the raspberry running pyHPSU isn't able to send commands (for instance rotex/command/heat_slope) anymore. Reading values keeps working. After restarting the hpsu service sending commands works again. Nothing is mentioned in the logfile (LOG_LEVEL_STRING = WARNING).

m-reuter commented 2 years ago

@svde I started a new issue for you, as it seems unrelated to the re-connect race condition discussed and fixed here. Or has this happened only recently after the the fix for this issue, and worked correctly before?

svde commented 2 years ago

@svde I started a new issue for you, as it seems unrelated to the re-connect race condition discussed and fixed here. Or has this happened only recently after the the fix for this issue, and worked correctly before?

Its probably unrelated.

Thanks for opening the new issue.

Spanni26 / pyHPSU

re-connect at every mqtt command #50