bellrichm / WeeWX-MQTTSubscribe

A WeeWX service and driver that receives data from MQTT.
GNU General Public License v3.0
52 stars 13 forks source link

MQTT problems cause WeeWX consuming all the CPU #179

Open roe-dl opened 1 year ago

roe-dl commented 1 year ago

Observations:

Mar  2 19:01:31 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779978.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:31 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779980.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779983.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779985.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779988.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779990.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779993.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779995.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779998.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780000.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780003.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:33 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780005.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:34 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780008.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:36 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780010.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:36 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780013.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:37 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780015.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780018.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780020.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780023.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780025.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780028.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780030.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780033.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780035.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780038.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780040.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780043.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780045.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780048.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780050.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780053.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780055.000000 which is prior to previous packet 1677780068.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780057.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780059.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780062.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780064.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780067.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780069.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780072.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780074.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780077.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780079.000000 which is prior to previous packet 1677780101.000000
Mar  2 19:02:07 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780082.000000 which is prior to previous packet 1677780117.000000
Mar  2 19:02:08 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780084.000000 which is prior to previous packet 1677780117.000000
Mar  2 19:02:10 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780087.000000 which is prior to previous packet 1677780117.000000
Mar  2 19:02:42 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780089.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780092.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780094.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780097.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780099.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780102.000000 which is prior to previous packet 1677780153.000000
Mar  2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780104.000000 which is prior to previous packet 1677780153.000000
bellrichm commented 1 year ago

You could try setting ignore_start_time = True. This will result in MQTTSubscribe processing the incoming data that has a time ‘prior to previous packet’. Some additional information can found here, https://github.com/bellrichm/WeeWX-MQTTSubscribe/wiki/Date-Time-processing#mqttsubscribeservice-processing Because WeeWX/MQTTSubscribe id being ‘flooded’ with multiple incoming messages per second, I’m not sure this will solve the performance issue.

roe-dl commented 1 year ago

Yes, I could set ignore_start_time = True, but that is not what I meant. Indeed, in this case, the base thing to do is to fix the overload situation of the MQTT broker. I did so. So no problem at my side any more.

What I meant is this: The MQTTSubscribe service should be able to handle situations when packets come in irregularly because of external conditions.

bellrichm commented 1 year ago

Sorry, I’m not following what you are requesting. Setting the various time configuration settings is for the exact situation where the time in the MQTT payload ‘cannot be trusted’.

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has been waiting for more information for 4 weeks.

roe-dl commented 1 year ago

The time in the MQTT payload can be trusted. Because of some temporary overload situation the packets did not come in order any more or they came delayed. Such temporary situation should not result in an unrecoverable error condition.

bellrichm commented 1 year ago

The error message is because the driver is emitting loop packets out of order. What driver are you using?

I agree its not good for WeeWX to consume the CPU, as of now I don’t know of a way for MQTTSubscribe to safe guard this. I’ll keep this open for a bit and attempt to to recreate the condition.

bellrichm commented 1 year ago

Current plan. If the driver emits X ‘stale’ packets in a row OR Y total ‘stale’ packets, MQTTSubscribe will shut itself down. X and Y will be configurable.

bellrichm commented 1 year ago

Below I have reformatted the log snippet to make what is happening more apparent. Using the log time of 19:01:39 as an example. In this one second WeeWX has dispatched 13 new_loop_packet events. Each of these events has been raised because the driver has generated a loop packet. Each of these loop packet have a time difference of 2 to 3 seconds. The times are all in the past. That is a lot of CPU cycles being used outside of MQTTSubscribe. Because the loop packet is ‘in the past’, the main thread of MQTTSubscribe logs it and ignores it. This would add some additional CPU processing, but minimal. There are no errors in the snippet with errors dealing with the incoming MQTT messages. So it appears that this thread is processing as normal (yes this a guess based on the information available). And unless the broker is flooding the topic with messages, it would not be consuming an unusual amount of CPU.

I realized that I can’t gracefully shutdown MQTTSubscribe. All that MQTTSubscribe can do is raise an exception which will shutdown WeeWX. I’ll think a bit more, but right now I don’t think there is much MQTTSubscribe can do.

        Log time        Packet time             Previous packet time        
    Mar  2 19:01:31 1677779978  18:59:38    1677780068  19:01:08    
    Mar  2 19:01:31 1677779980  18:59:40    1677780068      
    Mar  2 19:01:32 1677779983  18:59:43    1677780068      
    Mar  2 19:01:32 1677779985  18:59:45    1677780068      
    Mar  2 19:01:32 1677779988  18:59:48    1677780068      
    Mar  2 19:01:32 1677779990  18:59:50    1677780068      
    Mar  2 19:01:32 1677779993  18:59:53    1677780068      
    Mar  2 19:01:32 1677779995  18:59:55    1677780068      
    Mar  2 19:01:32 1677779998  18:59:58    1677780068      
    Mar  2 19:01:32 1677780000  19:00:00    1677780068      
    Mar  2 19:01:32 1677780003  19:00:03    1677780068      
    Mar  2 19:01:33 1677780005  19:00:05    1677780068      
    Mar  2 19:01:34 1677780008  19:00:08    1677780068      
    Mar  2 19:01:36 1677780010  19:00:10    1677780068      
    Mar  2 19:01:36 1677780013  19:00:13    1677780068      
    Mar  2 19:01:37 1677780015  19:00:15    1677780068      
    Mar  2 19:01:38 1677780018  19:00:18    1677780068      
    Mar  2 19:01:38 1677780020  19:00:20    1677780068      
    Mar  2 19:01:38 1677780023  19:00:23    1677780068      
    Mar  2 19:01:39 1677780025  19:00:25    1677780068      
    Mar  2 19:01:39 1677780028  19:00:28    1677780068      
    Mar  2 19:01:39 1677780030  19:00:30    1677780068      
    Mar  2 19:01:39 1677780033  19:00:33    1677780068      
    Mar  2 19:01:39 1677780035  19:00:35    1677780068      
    Mar  2 19:01:39 1677780038  19:00:38    1677780068      
    Mar  2 19:01:39 1677780040  19:00:40    1677780068      
    Mar  2 19:01:39 1677780043  19:00:43    1677780068      
    Mar  2 19:01:39 1677780045  19:00:45    1677780068      
    Mar  2 19:01:39 1677780048  19:00:48    1677780068      
    Mar  2 19:01:39 1677780050  19:00:50    1677780068      
    Mar  2 19:01:39 1677780053  19:00:53    1677780068      
    Mar  2 19:01:39 1677780055  19:00:55    1677780068  19:01:08    
    Mar  2 19:01:43 1677780057  19:00:57    1677780101  19:01:41    
    Mar  2 19:01:43 1677780059  19:00:59    1677780101      
    Mar  2 19:01:43 1677780062  19:01:02    1677780101      
    Mar  2 19:01:43 1677780064  19:01:04    1677780101      
    Mar  2 19:01:43 1677780067  19:01:07    1677780101      
    Mar  2 19:01:43 1677780069  19:01:09    1677780101      
    Mar  2 19:01:43 1677780072  19:01:12    1677780101      
    Mar  2 19:01:44 1677780074  19:01:14    1677780101      
    Mar  2 19:01:44 1677780077  19:01:17    1677780101      
    Mar  2 19:01:44 1677780079  19:01:19    1677780101  19:01:41    
    Mar  2 19:02:07 1677780082  19:01:22    1677780117  19:01:57    
    Mar  2 19:02:08 1677780084  19:01:24    1677780117      
    Mar  2 19:02:10 1677780087  19:01:27    1677780117  19:01:57    
    Mar  2 19:02:42 1677780089  19:01:29    1677780153  19:02:33    
    Mar  2 19:02:43 1677780092  19:01:32    1677780153      
    Mar  2 19:02:44 1677780094  19:01:34    1677780153      
    Mar  2 19:02:44 1677780097  19:01:37    1677780153      
    Mar  2 19:02:44 1677780099  19:01:39    1677780153      
    Mar  2 19:02:44 1677780102  19:01:42    1677780153      
    Mar  2 19:02:44 1677780104  19:01:44    1677780153
bellrichm commented 1 year ago

What driver are you using? What did you do to workaround the problem?

roe-dl commented 1 year ago

I use the driver WeatherLinkLiveUDP. It receives LOOP messages every 2 to 3 seconds. I have a WeatherLinkLive 6100 device from Davis.

There was no workaround, there was a solution. The base reason was a problem of the RAM of the computer. It resulted in too small memory available. Changing the RAM module solved it.

The problem of too much log messages of the same kind I had myself, when I wrote some code. In that case I set a flag after creating the log message that prevents sending the same log message for some time. Something like this:

    if self.last_log_time<(time.time()-300):
        logerr(...)
        self.last_log_time = time.time()

So the error is logged at most once every archive interval. You could add a counter to log how much occurrences of the error condition took place since the last log message.

bellrichm commented 1 year ago

Thanks for the information. I think throttling the error messages is probably the way to go, but I am still thinking about shutting MQTTSubscribe down in these situations.

roe-dl commented 1 year ago

I think the driver/service should be able to recover if the error condition vanishes. Shutting it down would prevent this.

If the driver/service is shut down in case of error, this would require the user to monitor such events manually to do something about it.

bellrichm commented 1 year ago

I think the driver/service should be able to recover if the error condition vanishes. Shutting it down would prevent this.

If the driver/service is shut down in case of error, this would require the user to monitor such events manually to do something about it.

Makes sense. I will look into some form of throttling the number of messages. Thanks for your thoughts. rich