Closed pauloon closed 1 year ago
@pauloon it seems like i've resolved the issue here.
The issue was that I had some power plugs with consumption-statistics that were pushing these statistics every second. After changing this to once every 10 seconds, my Z2M no longer seems to crash and the memory is at a table level.
@XanderTenBoden , Or maybe you reduced the traffic so it will take much longer to crash?
Like, in the test pc that I've configured, and memory is still building up, only much slower?
This pc only has one scene button configured, and a smart plug.
If I have one or two plugs reporting at every 5 seconds, shouldn't this be beareable for Z2M?
That is very frustrating.
Rgds, Paulo.
@pauloon I will watch what happens, but it seems there is no increase in memory consumption anymore. Not even a slight one. In fact, it went down with 40MB over the last 2 hours.
I think what happened (at least here) is that the power consumption reports were too much to handle for my coordinator. So it builds up a queue in memory. That might also explain why my lights stopped working after a certain amount of time, because the queue was simply getting too long. Not 100% sure this is what happened, but it makes sense in my head.
It might also mean that in your case, maybe your network is too big for the type of coordinator you are using? But that's only a wild guess though.
@XanderTenBoden I have same problem as you guys. This must be some regression as it was working fine until 1.28.0-1, and is still running since I reverted to that version.
But 50 devices too big?
Anyway, a brand new install with only 2 devices (yes, one is reporting power consumption), and it is still building up memory. Still looks like a bug to me... @Koenkk could take a look at this for us.
Thanks, Paulo.
@pauloon maybe you can try what happens if you disable the power consumption reporting? Could be that something has changed in the handling of that since the new release?
@dunkelz I didn't have problems before that release either. However downgrading to 1.28.x didn't resolve the problem for me either 🤷♂️
What are the models of the plugs you removed from the network? There is a memory leak somewhere (and this should be fixed)
@Koenkk TS011F_plug_3
@XanderTenBoden
Could you check if the issue is fixed with the following external converter (energy measurement won't work anymore)
const fz = require('zigbee-herdsman-converters/converters/fromZigbee');
const tz = require('zigbee-herdsman-converters/converters/toZigbee');
const exposes = require('zigbee-herdsman-converters/lib/exposes');
const reporting = require('zigbee-herdsman-converters/lib/reporting');
const extend = require('zigbee-herdsman-converters/lib/extend');
const ota = require('zigbee-herdsman-converters/lib/ota');
const tuya = require('zigbee-herdsman-converters/lib/tuya');
const utils = require('zigbee-herdsman-converters/lib/utils');
const e = exposes.presets;
const ea = exposes.access;
const TS011Fplugs = ['_TZ3000_5f43h46b', '_TZ3000_cphmq0q7', '_TZ3000_dpo1ysak', '_TZ3000_ew3ldmgx', '_TZ3000_gjnozsaz',
'_TZ3000_jvzvulen', '_TZ3000_mraovvmm', '_TZ3000_nfnmi125', '_TZ3000_ps3dmato', '_TZ3000_w0qqde0g', '_TZ3000_u5u4cakc',
'_TZ3000_rdtixbnu', '_TZ3000_typdpbpg', '_TZ3000_kx0pris5', '_TZ3000_amdymr7l', '_TZ3000_z1pnpsdo', '_TZ3000_ksw8qtmt',
'_TZ3000_1h2x4akh', '_TZ3000_9vo5icau', '_TZ3000_cehuw1lw', '_TZ3000_ko6v90pg', '_TZ3000_f1bapcit', '_TZ3000_cjrngdr3',
'_TZ3000_zloso4jk', '_TZ3000_r6buo8ba', '_TZ3000_iksasdbv', '_TZ3000_idrffznf', '_TZ3000_okaz9tjs', '_TZ3210_q7oryllx',
'_TZ3000_ss98ec5d', '_TZ3000_gznh2xla', '_TZ3000_hdopuwv6', '_TZ3000_gvn91tmx', '_TZ3000_dksbtrzs', '_TZ3000_b28wrpvx',
'_TZ3000_aim0ztek', '_TZ3000_mlswgkc3', '_TZ3000_7dndcnnb', '_TZ3000_waho4jtj', '_TZ3000_nmsciidq', '_TZ3000_jtgxgmks',
'_TZ3000_rdfh8cfs', '_TZ3000_yujkchbz', '_TZ3000_fgwhjm9j', '_TZ3000_qeuvnohg', '_TZ3000_rul9yxcc'];
const fzLocal = {
metering_skip_duplicate: {
...fz.metering,
convert: (model, msg, publish, options, meta) => {
if (utils.hasAlreadyProcessedMessage(msg, model)) return;
return fz.metering.convert(model, msg, publish, options, meta);
},
},
electrical_measurement_skip_duplicate: {
...fz.electrical_measurement,
convert: (model, msg, publish, options, meta) => {
if (utils.hasAlreadyProcessedMessage(msg, model)) return;
return fz.electrical_measurement.convert(model, msg, publish, options, meta);
},
},
}
const definition = {
fingerprint: [].concat(...TS011Fplugs.map((manufacturerName) => {
return [160, 69, 68, 65, 64].map((applicationVersion) => {
return {modelID: 'TS011F', manufacturerName, applicationVersion};
});
})),
model: 'TS011F_plug_3',
description: 'Smart plug (with power monitoring by polling)',
vendor: 'TuYa',
whiteLabel: [{vendor: 'VIKEFON', model: 'TS011F'}, {vendor: 'BlitzWolf', model: 'BW-SHP15'},
{vendor: 'Avatto', model: 'MIUCOT10Z'}, {vendor: 'Neo', model: 'NAS-WR01B'}],
ota: ota.zigbeeOTA,
fromZigbee: [fz.on_off, fzLocal.electrical_measurement_skip_duplicate, fzLocal.metering_skip_duplicate, fz.ignore_basic_report,
fz.tuya_switch_power_outage_memory, fz.ts011f_plug_indicator_mode, fz.ts011f_plug_child_mode],
toZigbee: [tz.on_off, tz.tuya_switch_power_outage_memory, tz.ts011f_plug_indicator_mode, tz.ts011f_plug_child_mode],
configure: async (device, coordinatorEndpoint, logger) => {
await tuya.configureMagicPacket(device, coordinatorEndpoint, logger);
const endpoint = device.getEndpoint(1);
endpoint.saveClusterAttributeKeyValue('haElectricalMeasurement', {acCurrentDivisor: 1000, acCurrentMultiplier: 1});
endpoint.saveClusterAttributeKeyValue('seMetering', {divisor: 100, multiplier: 1});
device.save();
},
options: [exposes.options.measurement_poll_interval()],
exposes: [e.switch(), e.power(), e.current(), e.voltage().withAccess(ea.STATE),
e.energy(), exposes.enum('power_outage_memory', ea.ALL, ['on', 'off', 'restore'])
.withDescription('Recover state after power outage'),
exposes.enum('indicator_mode', ea.ALL, ['off', 'off/on', 'on/off', 'on'])
.withDescription('Plug LED indicator mode'), e.child_lock()],
// onEvent: (type, data, device, options) =>
// tuya.onEventMeasurementPoll(type, data, device, options, true, device.applicationVersion === 160),
};
module.exports = definition;
configuration.yaml
as ext_converter.js
configuration.yaml
:
external_converters:
- ext_converter.js
Update:
Always up! Hehehe And watchdog makes it:
@Koenkk I will try this when I'm home after work and let you know!
@Koenkk ,
See that the clean install that I have running on another computer is still slowly building up memory:
It only has two devices:
Thanks for helping, Paulo.
@Koenkk ,
See that the clean install that I have running on another computer is still slowly building up memory:
It only has two devices:
Thanks for helping, Paulo.
Seems that it is indeed that power monitoring then, since you've got the exact same plug as I have mentioned earlier. Can you also try @Koenkk 's solution that he posted earlier?
Seems that it is indeed that power monitoring then, since you've got the exact same plug as I have mentioned earlier. Can you also try @Koenkk 's solution that he posted earlier?
Yours is a "plug 3" and mine is a "plug 1". Data is all different. Mine pushes energy monitoring through reporting, yours uses polling.
@pauloon yes, the difference is the firmware of the plug, otherwise they are the same AFAIK. Although I'm don't think Koen's solution will make a difference in your case though, because his script only affects "plug 3" 🤔
@XanderTenBoden
Could you check if the issue is fixed with the following external converter (energy measurement won't work anymore)
@Koenkk To be sure: do you mean the HA configuration.yaml, or the Z2M configuration.yaml? I guess the last one?
@XanderTenBoden Could you check if the issue is fixed with the following external converter (energy measurement won't work anymore)
@Koenkk To be sure: do you mean the HA configuration.yaml, or the Z2M configuration.yaml? I guess the last one?
Let me help you. It's the Z2M configuration.
And the ext_converter file you put inside the Z2M folder too. Hope it helps.
Z2M configuraiton.yaml indeed.
@Koenkk I juist changed the reporting of 8 of my plugs back to once a second 15 minutes ago to confirm that it would indeed result in a big increase of memory consumption. If that happens again I will load your file as suggested and see what happens then.
@XanderTenBoden ah once a second can explain the issue, the polling happens fast than the network/device can handle which will causes the memory increase. Let me know if this fixed it, then I will push a fix.
Even without the fix and the plugs set to once in 10 seconds there seems still to be a memory increase though. Just at a waaaay slower rate. This is what it looked like just before I changed them:
@Koenkk It results in an error when starting Z2M:
/app/dist/util/externally-loaded.js:13
fingerprint: [].concat(...TS011Fplugs.map((manufacturerName) => {
^
ReferenceError: TS011Fplugs is not defined
at /app/dist/util/externally-loaded.js:13:31
at Script.runInContext (node:vm:141:12)
at Script.runInNewContext (node:vm:146:17)
at Object.runInNewContext (node:vm:306:38)
at loadModuleFromText (/app/lib/util/utils.ts:148:8)
at loadModuleFromFile (/app/lib/util/utils.ts:155:12)
at Object.getExternalConvertersDefinitions (/app/lib/util/utils.ts:165:25)
at getExternalConvertersDefinitions.next (<anonymous>)
at new ExternalConverters (/app/lib/extension/externalConverters.ts:12:20)
at new Controller (/app/lib/controller.ts:84:58)
Updated #14853 (comment)
[20:38:14] INFO: Starting Zigbee2MQTT...
/app/dist/util/externally-loaded.js:33
fromZigbee: [fz.on_off, fzLocal.electrical_measurement_skip_duplicate, fzLocal.metering_skip_duplicate, fz.ignore_basic_report,
^
ReferenceError: fzLocal is not defined
at /app/dist/util/externally-loaded.js:33:29
at Script.runInContext (node:vm:141:12)
at Script.runInNewContext (node:vm:146:17)
at Object.runInNewContext (node:vm:306:38)
at loadModuleFromText (/app/lib/util/utils.ts:148:8)
at loadModuleFromFile (/app/lib/util/utils.ts:155:12)
at Object.getExternalConvertersDefinitions (/app/lib/util/utils.ts:165:25)
at getExternalConvertersDefinitions.next (<anonymous>)
at new ExternalConverters (/app/lib/extension/externalConverters.ts:12:20)
at new Controller (/app/lib/controller.ts:84:58)
Just one comment.... I'm not sure this problem is connected to the "power consumption" because I noticed that in my test PC the power reporting of my plug is disabled, and memory keeps rising...
I guess that more data makes memory rise faster, so power consumption contributes, but is not directly connected to it.
@Koenkk , Many times I have these restarts due to the memory crash, some devices seems to "loose pairing" and need to be manually repaired. Is this expected?
Thanks, Paulo.
@pauloon same here, mainly (battery powered) movement sensors stopped working after these crashes.
https://github.com/Koenkk/zigbee2mqtt/issues/14853#issuecomment-1321196668 should be good now.
Many times I have these restarts due to the memory crash, some devices seems to "loose pairing" and need to be manually repaired. Is this expected?
Shouldn't be the case, but lets first fix the crash itself.
@Koenkk , Did you have any success replicating the problem, so you can analyze?
Please, let us know. I'm telling other friends to stand by, that this is being fixed. If you need help testing stuff, please tell me.
Thanks, Paulo.
@pauloon it doesn't happen in my setup, the only ones impacted seems to be you two (I haven't gotten any other reports)
@pauloon it doesn't happen in my setup, the only ones impacted seems to be you two (I haven't gotten any other reports)
But do you use specifically HassOS with the Z2M AddOn?
I participate in a group about HA with more than 2.000 people and most of the ones that use HassOS and Z2M Add-On are having this exact same problem. They hadn't noticed before because they are not very "technical", only have a few devices (so it takes a long time to crash) and they use watchdog, so it restarts automatically. Many people just restart HA and life goes on until it takes a long time to crash again. And they are not familiar with this GitHub and don't know where to look to evaluate this further. The ones that are most impacted are switching to ZHA due to this.
It could be an individual problem indeed, but from the moment I got another PC here at my home, installed a brand new HassOS and Z2M from zero, without any customization, and the problem also happens, it is pretty much an indication of a bug, correct?
If you would like to take closer look at this install that I've setup, I can provide remote access so you investigate. But, please, do not let this go. I really like Z2M and would love to have it working fine.
Please, let me know.
Thanks, Paulo.
This is how it is affecting my HA use:
But do you use specifically HassOS with the Z2M AddOn?
I use Z2M in docker since I run an unsupervised HA.
I'm wondering if maybe the logging causes this buildup. Can you try to set the log_level
to error and see if it takes longer before the crash happens?
But do you use specifically HassOS with the Z2M AddOn?
I use Z2M in docker since I run an unsupervised HA.
I'm wondering if maybe the logging causes this buildup. Can you try to set the
log_level
to error and see if it takes longer before the crash happens?
Just did that. Do I keep the zigbee_herdsman_debug on?
Thanks, Paulo.
@Koenkk @pauloon sorry for the silence, I've had some very busy days with work and didn't have time to come back to this issue earlier.
At this moment, I was still running Z2M without the changes @Koenkk provided, and with the plugs set to poll every 10 seconds instead of every one second. This stopped Z2M from crashing for me. However, there is still an increase in RAM usage going on. Just at a much slower phase:
I have just added the changes @Koenkk provided and rebooted Z2M, which now seems to work without errors. I also changed a couple of plugs back to polling once a second. I will keep you guys updated about what happens now :-)
But do you use specifically HassOS with the Z2M AddOn?
I use Z2M in docker since I run an unsupervised HA.
I'm wondering if maybe the logging causes this buildup. Can you try to set the
log_level
to error and see if it takes longer before the crash happens?
I changed the log and restarted it, but it looks like it did not change anything:
Still "eating" a bunch of memory very fast...
Please, let me know. Paulo.
@Koenkk I think it's safe to assume that your change indeed solves the issue. I've rebooted Z2M yesterday when I posted my previous comment, and the RAM build-up is no longer happening now:
It also appears to be way less "spikey" now for some reason.
Update, after turning off logs:
7 hours further, and the line is still as horizontal. So I'm certain that you're in the right direction @Koenkk :-)
@XanderTenBoden I've pushed the fix, check if electrical measurements work and if there is no memory buildup with the latest dev.
Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html)
@pauloon there are more TuYa devices using this polling method, I've applied the fix for all now (maybe you have more TuYa devices using this, the converter I provided only fixes it for TS011F_plug_3
)
Dear @Koenkk ,
Super! If I install this, does it work the same as the link you provided? I'm not familiar with Linux so I'm not very comfortable to use terminal commands.
Thanks, Paulo.
Yes that is the correct addon
Yes that is the correct addon
Dear @Koenkk ,
It looks like it was a success! This is so great.
Question: even installing the Edge version as add-on, it does not get updated automatically?
Thanks, Paulo.
This is the other test PC:
@XanderTenBoden I've pushed the fix, check if electrical measurements work and if there is no memory buildup with the latest dev.
Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html)
How do I switch between these 2 add-ons (normal vs edge) without having to setup all my devices again? Does this just work by installing the additional one and disabling the normal one and enabling the edge one?
@XanderTenBoden ,
Yes, that works. That's how I did it here. But don't forget to disable the "Start on boot" also, for the normal one.
Ai first you get some "Bad gateway" when trying to access, but I do make a few SHIFT + F5 or CTRL + F5 to refresh cache, and it worked.
Please let us know. Paulo.
@pauloon @Koenkk I've just switched to the edge repo. I will let you know what happens next :-)
Awesome, this fix will be included in the 1 December release.
@Koenkk ...
Can you confirm if the "Edge" version is updated automatically algo? Or just the regular?
@Koenkk it seems that the issue has been resolved. I've just checked RAM usage again and it has been more or less stable for the last 8 hours (It shows only a very slight increase of 20-25MB RAM usage with 8 plugs pulling every second.)
@pauloon edge does not update automatically, it is not versioned so you need to uninstall -> install to update.
Great that this has been solved!
What happened?
After last update I've noticed Z2M is stopping service with a fatal error, out of the blue.
I'm using HASS.IO all updated, in a i5 machine with 8 Gb do memory and 128 GB SSD.
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
Just leave it running. After one or two days it stops working (service drops).
Zigbee2MQTT version
1.28.2 commit: unknown
Adapter firmware version
20220219
Adapter
SONOFF USB Dongle
Debug log
I did not have the debug log active when this happened. Posting normal log. ... Zigbee2MQTT:info 2022-11-07 11:45:15: MQTT publish: topic 'zigbee2mqtt/Smart Plug 15', payload '{"child_lock":"UNLOCK","current":0.04,"energy":6.45,"indicator_mode":"off/on","last_seen":"2022-11-07T11:45:13-03:00","linkquality":102,"power":0,"power_outage_memory":"restore","state":"ON","update":{"state":"idle"},"update_available":false}' <--- Last few GCs ---> [7:0x7fa21c7993c0] 61013533 ms: Mark-sweep 2044.2 (2085.3) -> 2042.2 (2085.3) MB, 2087.8 / 0.0 ms (average mu = 0.133, current mu = 0.010) allocation failure scavenge might not succeed [7:0x7fa21c7993c0] 61015639 ms: Mark-sweep 2044.3 (2085.3) -> 2042.2 (2085.3) MB, 2082.8 / 0.0 ms (average mu = 0.074, current mu = 0.011) allocation failure scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory