digidotcom / xbee-python

Python library to interact with Digi International's XBee radio frequency modules.
Mozilla Public License 2.0
185 stars 93 forks source link

Strange "freeze" of io_samples data from ADC. #253

Open Artyrm opened 3 years ago

Artyrm commented 3 years ago

It is very strange and elusive problem. I understand it is hard to investigate even for me, having access to the devices. But still I wanted to ask - maybe someone will have some ideas about what is happening.

It may be also a hardware problem - since I have it only with XBee SX 868 produced in 2020, and never saw it on earlier versions. Also, it happens more often after I upgraded to the latest firmware (A00A). But even with all those considerations I can't wrap my head around what is happening in Python software. My late issue here isn't connected with that problem, that was only one of measures I tried to implement to fix it.

So, I'm gathering various data via ADC, most notable, temperature. Local device is on Linux, async. sleeping nodes send their samples once in a while. So I have a callback, which checks if IO samples arrived:

def my_data_receive_callback(xbee_packet):
  frame_type = xbee_packet.get_frame_type()
  if (frame_type.name == "IO_DATA_SAMPLE_RX_INDICATOR"):
     address = xbee_packet.x64bit_source_addr.address.hex().upper() #remote.get_64bit_addr().address.hex().upper()
     sample = xbee_packet.io_sample
...

And so on. Usually it worked good. But with those new modules something strange happened.

Temperature may vary slowly, so it was not imediatly apparent to me, that in fact received data just "freeze". I found it out placing node from cold to hot conditions - and temperature doesn't move. I made a received packets counter, and find out that it is stuck as well. Also, I started to look at RAM state, and soon found, that upon that "freeze" RAM began to leak slowly (approx 0.5% of 1GB RAM in a minute, with 30 nodes sending once in half a minute). It happens like, several times a day, with no apparent reason or timing - so it is quite hard to catch. If I restart the software, values immediately catch up to where they should be. I set logging level to "DEBUG", but nothing suspicious or unusual there.

I decided to investigate it further. And to find out state of device = XBeeDevice(PORT, BAUD_RATE) I made deviceglobal. And then looks like memory leak ceased, just some minor fluctuations. Packets counter seems to be alive too.

But after a time actual values appears to be frozen again. I thought that something may be wrong with packets handling, but I'm logging each IO_sample raw packet. And when I parsed the log file, it showed me that indeed all those packets were with "frozen" values, and more than that, value may change by bit or two, but nothing near where it should be. So, if I get packets with slowly changing values, perhaps problem is on nodes side? But should I restart the software - values gaps to normal. Moreover, if I close serial port (device.serial_port.close() ) and then reopen it (along with device._packet_listener.run() since it automatically stops with serial port closed), values\packet immediately catch up.

I just thought, upon writing it - could it be that I'm dealing with some deeply buffered "historic" values in fact? Is there a way to check the state of buffers\queues (if any) for that matter in the library?

Anyways, I'd appreciate your thought on: 1) What may be the reasons of such behavior? 2) How can I tell from inside the software, when values\packets are "frozen"? 3) What can be done to "unfreeze", besides resetting serial port (it is ungraceful and some packets would apparently be lost during the process)?

Artyrm commented 3 years ago

Little update here.

I tried device.flush_queues(), unfortunately, it didn't help.

Artyrm commented 2 years ago

I can add to this that on 868LP module as receiver this seems to be non-issue. At the same time SX868 receiving data from 868LP has the same problem.

TheTripleV commented 2 years ago

Does changing

- remote = self.__try_add_remote_device(read_packet)
+ remote = None

on line 626 of reader.py fix your problem? https://github.com/digidotcom/xbee-python/blob/master/digi/xbee/reader.py#L626

Artyrm commented 2 years ago

@TheTripleV Thank you for your reply, I'll definitely give it a try.

Artyrm commented 2 years ago

@TheTripleV unfortunately, it is all the same - freeze, memory leak.

For now even to detect "freeze" state would be of great help.