Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.12k stars 1.68k forks source link

Iris v2 motion sensors 3326-L drop from mesh #13396

Closed schasj closed 1 year ago

schasj commented 2 years ago

What happened?

I used 7 of these devices with Hubitat for over 2 years and they were rock-solid reliable there. I had zero issues with them. Since moving these devices to Zigbee2MQTT a few months ago I have needed to either power-cycle or completely reset them while in "Permit Join" mode about weekly for each device.

What did you expect to happen?

Devices remain fully operational for years without any attention.

How to reproduce it (minimal and precise)

Pair known-good devices with fresh batteries. Wait a week or so for the devices to either not report motion or go unseen for a long time

Zigbee2MQTT version

1.26.0 commit: cd23aea and 1.27.0 commit: a9b8808

Adapter firmware version

20210708

Adapter

ITead_Sonoff_Zigbee_3.0_USB_Dongle_Plus

Debug log

Nothing pertinent outside of the messages when I Permit Join and reset or power-cycle the sensors

ashokaiyar commented 2 years ago

What zigbee routers/repeaters do you have on your z2m mesh?

schasj commented 2 years ago

I have 5 of SmartThings IM6001-OTP05 and 11 KMPCIL_RES005 spread throughout my mesh. The 2 unconnected devices have been unpowered for over a week but they work fine when powered.

image

ashokaiyar commented 2 years ago

Have all of your Iris v2 motion sensors dropped off, or is it a subset?

schasj commented 2 years ago

All of them but not concurrently because I reset or power-cycle them when I notice.

schasj commented 2 years ago

Possibly related, the batteries that last for years when these sensors were joined to Hubitat don't last even a month when joined to zigbee2mqtt. Here is a plot of just 3 of them over the last 8 days. I put new batteries in all of these on 7/30. It's unfortunate that they all happen to be a shade of blue but I think it makes my point.

Screenshot_2022-08-06_07-32-56

schasj commented 2 years ago

I just discovered a likely culprit in the rapid battery drain...reporting LQI changes that happens quite often. The same thing is happening with my ST water leak sensors, 2018 models. I see no way to stop these devices from reporting LQI or even slow it down. Hubitat must be doing it somehow because, as I've said, batteries in these devices last years when in use there.

schasj commented 2 years ago

@Koenkk, I think my last comment is off the mark and LQI is a red herring.

I captured the mesh traffic from both Hubitat and Z2M during the join process with the same Iris V2 Motion sensor and compared them with Wireshark where I searched for "configure reporting". What is shown here is Hubitat on the left in the order they were sent and Z2M on the right, rearranged to match(?) the order on the left.

In addition to the vastly different Interval settings I notice that Hubitat configures Remaining Battery Percentage while Z2M does not.

I'm guessing that the duplicate configuration messages in both systems is due to a missed ACK?

I don't know how significant this is so if there are other things to look for in the packet captures let me know.

image

Koenkk commented 2 years ago

@schasj if you want to check if this is the problem, you can easily configure reporting for this device via the z2m frontend -> click on your device -> reporting. Since this is a battery powered device, it might be necessary to wakeup the device right before configuring the reporting.

schasj commented 2 years ago

@Koenkk, I know that and I have. I made this comparison because Z2M is killing batteries somehow in these devices (and others) while Hubitat does not so I'm looking for differences in the way things are done in the hope that I can find out why so I don't have to abandon Z2M.

I have also disabled temperature reporting yet the temperature is still being reported. Shouldn't it have stopped?

Koenkk commented 2 years ago

-Did you set the reporting values (min/max and change) the same as with Hubitat?

schasj commented 2 years ago

Even higher for min/max but change the same as Hubitat. Plus I disabled Temperature altogether but it's still reported.

Koenkk commented 2 years ago

To debug further, a sniff would be needed when paired both to hubitat and z2m. https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html#with-cc2531 . I suggest to start sniffing from pairing the device + 6 hours.

schasj commented 2 years ago

Whilst I do those captures you may want to have a look at the ones I already have. They are only a few minutes long.

PairingPcaps.zip

schasj commented 2 years ago

@Koenkk, since you didn't say I'll leave the report settings default for Z2M. It'll be 6 hours from now before I start the Z2M capture so if you want me to change the reporting to be the same as Hubitat let me know.

schasj commented 2 years ago

@Koenkk, as requested. ZigbeePcaps.zip

Koenkk commented 2 years ago
const fz = require('zigbee-herdsman-converters/converters/fromZigbee');
const tz = require('zigbee-herdsman-converters/converters/toZigbee');
const exposes = require('zigbee-herdsman-converters/lib/exposes');
const reporting = require('zigbee-herdsman-converters/lib/reporting');
const extend = require('zigbee-herdsman-converters/lib/extend');
const ota = require('zigbee-herdsman-converters/lib/ota');
const tuya = require('zigbee-herdsman-converters/lib/tuya');
const e = exposes.presets;
const ea = exposes.access;

const definition = {
    zigbeeModel: ['3326-L'],
    model: '3326-L',
    vendor: 'Iris',
    description: 'Motion and temperature sensor',
    fromZigbee: [fz.ias_occupancy_alarm_2, fz.temperature, fz.battery],
    toZigbee: [],
    meta: {battery: {voltageToPercentage: '3V_2100'}},
    configure: async (device, coordinatorEndpoint, logger) => {
        const endpoint = device.getEndpoint(1);
        device.defaultSendRequestWhen = 'immediate';
        device.save();
        await endpoint.unbind('genPollCtrl', coordinatorEndpoint);
        await reporting.bind(endpoint, coordinatorEndpoint, ['msTemperatureMeasurement', 'genPowerCfg']);
        await reporting.temperature(endpoint);
        await reporting.batteryVoltage(endpoint);
    },
    exposes: [e.occupancy(), e.battery_low(), e.tamper(), e.temperature(), e.battery()],
};

module.exports = definition;

If not; please provide me a sniff again when the sensor is paired with + 6 hours

schasj commented 2 years ago

@Koenkk, I really appreciate the time and effort you're putting into solving this/these issue(s). I have a friend that is NOT having problems with the Iris devices in his mesh and I'll let you know what is different about his environment as soon as he responds to my request for that info. EDIT3: The friend is running in podman on Linux

None of the devices, Iris sensors included, have dropped out for over 30 hours now so the jury is still out on that issue.

Since the ZCL polling was happening only once per hour I'm not clear on why you think this made a difference but it is indeed no longer happening.

Looking at the battery powered devices in MQTT Explorer shows that the only chatty devices now are a couple of ST water sensors that report LQI changes every few seconds. EDIT: Power cycle and reset stopped this EDIT2: Good grief! I moved the strike-through to the intended place

Roughly 36 minutes into this capture I disabled temperature reporting on all my Iris sensors starting with the one paired for this test which is now 0xcfe6

schasj commented 2 years ago

I couldn't add this to my last comment for some reason.

Z2MIrisPairing.zip

schasj commented 2 years ago

I've finally captured a device's "last words" but it's a ST water sensor not a Iris motion sensor. I find nothing notable in this capture snippet, just that it's the last exchange from the device other than the Data Request every 7s for the last 24 hours. I'd hoped I'd see some sort of destructive exchange.

LastSeen.pcap.zip

Koenkk commented 2 years ago

Nothing special indeed, looking back at https://github.com/Koenkk/zigbee2mqtt/issues/13396#issuecomment-1214310720 I've missed one obvious difference. With Hubitat the sensor is connected directly with the coordinator (since the "Data requests" go to 0x0000, with z2m it is connected to a router. Maybe the sensor doesn't play well with some routers. I would suggest to repair the device close to the coordinator; sniff to verify it has been connected directly and see if it stays connected.

schasj commented 2 years ago

The ST water sensor was still showing Offline in the Availability column when I moved it to within inches of the controller and reset the device with the controller permitting joins. It paired up normally.

Then I decided to look at previous captures (whsniff is creating a new file at the top of every hour) to see when the device stopped talking altogether and to my surprise I found that it was still reporting right up until I reset it. So it seems the "problem" is not that the device dropped from the mesh, rather that z2m thinks it did.

This is the capture snippet that shows the last exchange before I reset the device, other than the Data Requests.

UpToRejoin.pcap.zip

~~P.S. BTW, it would be helpful in other regards if you'd have a look at my post in https://github.com/Koenkk/zigbee2mqtt/discussions/2921~~

schasj commented 2 years ago

Since I'm watching the ST water sensor I want to show a case where it's reporting temperature even though I disabled that reading in the device's Reporting tab. As you can see, temperature is the only value that changed so either z2m hasn't really disabled it or I don't understand what clicking Disable is meant to do.

image

Koenkk commented 2 years ago

By default z2m will mark the device offline if it hasn't received any status updates for 24 hours. Note that Data Request do not count for a status update, since the coordinator doesn't forward the messages to zigbee2mqtt.

Could it be that the sensor doesn't provide any status updates (e.g. temperature) for 24 hours? (or did you modify the default timeout values? https://www.zigbee2mqtt.io/guide/configuration/device-availability.html#device-availability)

schasj commented 2 years ago

No, I haven't changed the Availability settings...it's on Simple

schasj commented 2 years ago

@Koenkk, I've finally caught an Iris v2 motion sensor failing to "get through to home" and several hours later reconnecting.

Unfortunately, it's a device that's talking through a repeater so this could be a repeater problem but I figure it's worth reporting anyway. There are notes with some details in the .zip as well as the .pcap.

G2_0500-1803.zip

I'll grab the repeater's traffic for the period from just before the Iris "went away" through just after it came back and post it here.

schasj commented 2 years ago

Here is the capture file of the repeater's traffic with the device, again with notes.

WSO_1200-1804.zip

Koenkk commented 2 years ago

I don't see anything strange; but can you try forcing a status update after it shows as offline? E.g. triggering motion, are any packets send?

schasj commented 2 years ago

From the notes in G2_0500-1803.zip: "The event that started it up again at packet #17047 was me walking into the device's detection zone. I had done the same numerous times between 1401 and 1801."

Koenkk commented 2 years ago

The motion sensor (0x5efe) actually sends a lot of messages. Things start going wrong here:

Screenshot 2022-08-21 at 10 21 41

The sensor requests the network address of the coordinator, coordinator responds (no. 337) but the message never arrives back at the motion sensor causing it to constantly keep requesting the network address. This starts at no. 334 till no. 17069, I guess during this time the sensor is blocked. I expect that one of the routers in your network is the cause of this. Therefore I suggest to pair the sensor directly to the coordinator and see if the problem still occurs. You can check if it is by sniffing and seeing that the motion sensor directly sends it messages to 0x0000, e.g.:

Screenshot 2022-08-21 at 10 26 32
schasj commented 2 years ago

@Koenkk, thanks for the analysis and explanation. It picks up exactly where my inexperience leaves me clueless.

Since that sensor was routing through an ST plug-in outlet switch I have pulled all 6 of them and I'll wait to see what happens. I understand your suggestion that I move the sensor but this way is far less disturbance to my automations while also narrowing the field of possible problem repeaters by 1/3. I'll report here what the effects are.

schasj commented 2 years ago

After having the ST plug-in outlets unpowered for at least 36 hours I plugged them back in starting with the one that had been routing for the Iris motion sensor in the above captures/discussion. At least 12 hours after the first I plugged in the rest of the ST devices and, a day later, I note three things:

  1. The size of the hour-long capture files is smaller, averaging 4.4 MB where the ones from last week averaged 6 MB, meaning there is less traffic.
  2. I see that none of the IM6001-OTP05 devices are routing for any end devices now, leaving the KMPCIL_RES005 as the only routers for end devices. image
  3. So far the end devices are all behaving normally now and seem to have longer battery life although it's too soon to be conclusive.

I suppose this gets chalked up to Zigbee device quirkiness and I guess the takeaway is that the order of adding devices matters not just as in doing mains-powered devices before battery powered devices, but also in the order of routers. Then again I did not remove the ST devices from the mesh, just removed power so I suppose that says that even if they are added haphazardly the preferred routers can be "chosen" to handle end devices by removing power from the others for some period of time.

I will report back here with my observations of battery life, hopefully in a few weeks if the life is much longer and sooner if not.

schasj commented 2 years ago

I managed to "measure" current flow rather crudely. The best meter I own only goes down to a 0-300mA range and I'd need a microAmp range to do the measurement properly.

For this quick test I used 2 Iris v2 motion sensors, 1 paired with Hubitat and the other paired with z2m. Neither of these is the unit I used in earlier reports.

Hubitat The unit is connected directly as there are no routers in the mesh. The quiescent current reads 0.01 and pops up to roughly 0.28 about every 5 seconds which roughly corresponds to Data Requests

Z2M The unit is connected via a router. The quiescent current mostly reads 0.01 but has 0.02 instead (I'm guessing) about 20% of the readings. This unit pops up to roughly 0.30 about every 5 seconds, again roughly corresponding to Data Requests

I have no idea if these devices can adjust their xmit power as needed or if connecting to a router naturally increases the current draw. My guess is that neither is true. Based upon that guess it appears that there is some increased battery drain when in the Z2M mesh but even so I don't see it accounting for the vast difference in battery life of years in the Hubitat case to weeks/months in the Z2M case.

schasj commented 2 years ago

@Koenkk do each of the Iris sensors have to be removed and re-paired for the external converter (above) to be used? Are there other events that cause the external converter to be applied, like clicking the reconfigure button or a mesh panic?

Koenkk commented 2 years ago

to apply an external converter, simply restart z2m, nothing more is needed.

schasj commented 2 years ago

@Koenkk, I hate to bother you with this but I'm not finding anything in web searches. I have not found even an image to (dis)prove my belief that the "data" at the bottom of the packet details should also be dissected into readable commands/responses. Is this as good as it gets or am I missing a key or setting?

image

schasj commented 2 years ago

I doubt this has any bearing on the main topic but I've found that the "get" doesn't work for me, and not just on the Iris but on a couple other makes/models as well. When I publish, as below, I get a popup on the z2m page that says No converter available for "occupancy" ('')

image

Koenkk commented 2 years ago

@schasj this message is encrypted with a different key it seems (given that you filled in your z2m network key wireshark should be able to decrypt it), when in the sniff does it appear?

Regarding the get, we don't have a converter defined which allows to read the occupancy value. But it won't make a difference (since we rely on the sensor sending it, we don't want to poll it)

schasj commented 2 years ago

@Koenkk, indeed I was missing a key and now I'm getting full decryption.

The get was a poor example, as it turns out. And it was an intermediate test towards a feature request (#13861) to add "refresh" to your API.

schasj commented 2 years ago

@Koenkk, update on the ST water sensors: I updated the firmware in my Sonoff Dongle Plus from v20210708 to v20220928 and within 15 minutes the 5 sensors that had been offline for days came back online thus improving my optimism overall.

All of my in-use Iris motion sensors have been paired with Hubitat for over a week now and they have been stable with no drop in battery level. After I updated the dongle firmware I joined a spare Iris motion sensor with z2m to test it's stability and battery use. I'll post here the findings when something notable happens.

I also updated z2m to v1.28 this morning.

schasj commented 2 years ago

@Koenkk, this topic in the Hubitat community could shed insight to my problems with the Iris devices. https://community.hubitat.com/t/whats-up-with-all-the-extra-zigbee-ota-traffic-on-2-3-3/102205 The talk is about bad behavior introduced with later releases and that downgrading fixes the behavior. The topic is a bit confusing but what I get from it is that even though the Iris motion sensors themselves are not requesting new images the traffic from the devices that are requesting new images somehow causes them to go offline.

The 3rd pic in post #13 shows that the response in earlier, working versions is "no image available" and, though it doesn't say what the response is in the later, problem version is, I see that z2m is responding with "abort" and the requests just keep on coming and repeat the cycle ~30 minutes later.

I've captured the z2m traffic for over an hour and I filter with (zbee_zcl_general.ota.cmd.srv_rx.id == 0x01) || (zbee_zcl_general.ota.cmd.srv_tx.id == 0x02) to get just the image update request/response frames. imageUpdates.pcapng.zip If you sort on src/dest address you'll see the repetition I'm talking about.

And this is occurring with automatic updates off and the period set to daily.

image

Other related info: I loaned a couple of my Iris v2 motion sensors to a friend who uses Home Assistant and ZHA to see how they'd behave there. He had them paired with weak batteries installed (2.9V) for over a week without any drop-outs and the battery voltage remained the same throughout. I've had the exact device with the same battery paired with z2m for 2 days and it dropped off a few hours ago and the battery just dropped to 2.8V. It could be that it would have also dropped today were it still in the ZHA mesh but I'll bet it drops to 2.7 in less than a week from now.

I hope this helps.

Koenkk commented 2 years ago

Enabling the "Disable automatic update check" option and using the following external converter should trick z2m to send a "no image available" instead of an "abort". Can you try this?

const fz = require('zigbee-herdsman-converters/converters/fromZigbee');
const tz = require('zigbee-herdsman-converters/converters/toZigbee');
const exposes = require('zigbee-herdsman-converters/lib/exposes');
const reporting = require('zigbee-herdsman-converters/lib/reporting');
const extend = require('zigbee-herdsman-converters/lib/extend');
const ota = require('zigbee-herdsman-converters/lib/ota');
const tuya = require('zigbee-herdsman-converters/lib/tuya');
const utils = require('zigbee-herdsman-converters/lib/utils');
const e = exposes.presets;
const ea = exposes.access;

const definition = {
    zigbeeModel: ['3326-L'],
    model: '3326-L',
    vendor: 'Iris',
    description: 'Motion and temperature sensor',
    fromZigbee: [fz.ias_occupancy_alarm_2, fz.temperature, fz.battery],
    toZigbee: [],
    meta: {battery: {voltageToPercentage: '3V_2100'}},
    configure: async (device, coordinatorEndpoint, logger) => {
        const endpoint = device.getEndpoint(1);
        await reporting.bind(endpoint, coordinatorEndpoint, ['msTemperatureMeasurement', 'genPowerCfg']);
        await reporting.temperature(endpoint);
        await reporting.batteryVoltage(endpoint);
    },
    ota: true,
    exposes: [e.occupancy(), e.battery_low(), e.tamper(), e.temperature(), e.battery()],
};

module.exports = definition;
schasj commented 2 years ago

Thanks, @Koenkk, but I evidently wasn't clear. It's not the Iris that are requesting updates, it's my Hue Outdoor motion and Iman Haryardi's multisensors that are requesting updates and I think might be affecting the Iris based on the HE topic I linked. I don't see the connection either but the posts in that topic provide anecdotal evidence that it was happening in HE's mesh (since fixed).

Here are the devices I had setup for automatic updates which are now disabled as shown in my last post:

Screenshot_2022-10-26_14-52-11

Notice that Iman's sensors are requesting updates though I never set them up to do that. You can identify them in the packet capture I posted by their manufacturer code of TexasIns in the MAC address.

The packet captures after applying your converter are the same as yesterday, with Abort being the response, but that is unsurprising given that the converter affects only the Iris and I never did see them make an image request and still don't.

I suppose converters are needed for the Hue motion sensors and Iman's multisensors Screenshot_2022-10-26_15-24-36

Screenshot_2022-10-26_15-26-46

I have to say that I don't understand why any device gets Abort in the response because No Image Available seems to be more appropriate.

Koenkk commented 2 years ago

Modified the code to always return NO_IMAGE_AVAILABLE

Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html)

schasj commented 2 years ago

@Koenkk did you forget to commit?

Koenkk commented 2 years ago

No https://github.com/Koenkk/zigbee2mqtt/commit/60de5b7676192a56cb02ca5335472541e3910cf9

schasj commented 2 years ago

@Koenkk, hmmm, I just did docker pull koenkk/zigbee2mqtt:latest-dev' and got Screenshot_2022-10-29_09-47-43 which is still replying with "abort"

schasj commented 2 years ago

@Koenkk, I had an error in the start command so I wasn't running the dev branch even though I pulled it. I now have image and the aborts are gone. Time will tell if this changes anything. Thanks again for addressing this.

schasj commented 2 years ago

@Koenkk, I'm sorry to say that the Iris is still chewing batteries, though the life may be a little better than when the "aborts" were the response but nowhere near what it should be.

I'm also still having devices dropping from the mesh, mostly the ST water sensors but also a couple of Hue Outdoor motion sensors.

I'm out of ideas/suggestions.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

saucedge commented 1 month ago

I know this is closed... but I think I'm struggling with the same issue. When I switched to zigbee2mqtt from Smartthings a while back these sensors went from my favorite to the highest maintenance sensors I own. I had been blaming Amazon for crappy batteries because only one of the 3 Iris sensors kept disconnecting (happens to be the sensor furthest from the hub). Whenever I replace the batteries, the sensor would work for a bit before dropping out again. I only recently got fed-up enough to pay closer attention since the sensor in question is very low impact on my automations... and I found the sensor was definitely working fine but was just not reaching the hub. Looking at the Home Assistant sensors, it seems to be reaching the hub every once in a while, but just doesn't report anything in between. image

I'm definitely a noob when it comes to debugging zigbee traffic... but I'm happy to help if it would be useful. For what it's worth, I did try another version Iris sensor (iL07_1) in the same spot and it behaved the exact same way. That combined with the fact that the problem correlates to a spot in my house makes it seem very likely to be a problem with old Iris sensors routing through repeaters. But I suppose that also means it's likely not something zigbee2mqtt can fix.