ghoti57 / evofw3

Major overhaul of evofw2 Evohome listening software to use asynchronous radio mode
61 stars 12 forks source link

ATMega328p stops responding, also weird response to "!F" #41

Open thigger opened 1 year ago

thigger commented 1 year ago

Hi,

I have evofw3 running on a 5V ATMega328P. Every now and again (~24-48h) it's stopping emitting any packets and doesn't respond to commands, requiring a reset. I appreciate it may be just the board but on connecting it to my dev PC just now I ended up in a strange state where it wasn't emitting packets but still responded to !F (I'm using that to test comms) - and then spontaneously fixed itself.

Does this point to any specific issue? I appreciate it may be unrelated to the loss of comms I've been getting (when that happens it won't respond to !F)

thanks

!F
# !F F=216572
082 RQ --- 30:071715 01:067930 --:------ 10A0 001 00
074 RP --- 01:067930 30:071715 --:------ 10A0 006 001518000384

etc etc, then board stops emitting packets, but still responds to the !F command with weird numbers, before eventually spontaneously fixing itself:

!F
# !F F=710000
!F
# !F F=442200
!F
# !F F=700000
!F
# !F F=440000
!F
# !F F=210000
!F
# !F F=220000
!F
# !F F=440000
!F
# !F F=620000
066 RP --- 01:067930 30:071715 --:------ 0006 004 00050962
081  I --- 04:146860 --:------ 04:146860 30C9 003 0007E0
089  I --- 04:146848 --:------ 01:067930 12B0 003 060000
081  I --- 04:146848 --:------ 01:067930 12B0 003 060000
!F
# !F F=216572

EDIT: this may be a power supply issue!

ghoti57 commented 1 year ago

Firstly, While the 16MHz ATMega328P is capable of capturing messages from a RAMSES based control system (e.g. EvoHome) it is not as good as a ATMega32U4 based device.

This is because the ATMega328P devices use their HW UART interface to communicate with a host device (PC/Linux/MAC) via a UART/USB converter chip. The ATMega32U4 has an internal USB interface leaving the HW UART free to interface to the CC1101 chip while the ATMega328P has to use a software implementation of a UART.

This SW UART implementation uses most of the available processor cycles while it is receiving a message leaving little scope for sophisticated error handling.

thigger commented 1 year ago

Thanks - before looking into this I had naively thought the 328P might be superior due to not having to handle USB internally! In this case though I think it's the power supply to the CC1101 making the SPI bus misbehave (and presumably the ATMega is hanging on the while loop that waits for MISO). I've sorted out the power and it's been running fine overnight so far.

Thanks for the firmware by the way! I'm pretty sure I have a spare 32U4 somewhere and will give it a try.

ghoti57 commented 1 year ago

So now to your observations.

Firstly, when you enter the !F command what it does is to read 3 register values from the CC1101 SPI interface and use them to report the configured frequency divider. The expected value is 21656A which implies you felt the need to run an auto-calibration sequence.

I have said repeatedly that this was only necessary for devices based on a particular batch of ATMega328P sticks from a German supplier (can't remember which one) that appeared to have a low quality crystal fitted to the cc1101 module. Autotune was never really intended to be a long term feature of the code and it is possible that it has bugs.

The value you have is only different by 8 a difference of about 0.0003% from the ideal value. I strongly suggest you reset the cc1101 parameters back to default with the cmd !ER followed by a power cycle of the device.

ghoti57 commented 1 year ago

As to why you see different values in response to !F I can think of a couple of reasons

  1. The internal state of the cc1101 has been corrupted - unlikely because it wouldn't self recover.
  2. The SPI interface is being used for something else so the values acquired are not those expected. I think this is the case.

It could be that you have a noisy radio environment or that the listening device is not in an ideal location compared to your control system devices. The 3 digit value at the beginning of each message line is an RSSI value; small values are good, bigger values are bad. Values above 090 are often a signe of poor reception and you have some reported values near this boundary.

You should ensure that the antenna on the device is vertical as this is the alignment used in all the control system devices. The antenna being in other planes will result in poorer reception.

I suspect that the FW is detecting what it believes is the beginning of a new message but then, before a successful ending is detected, an error in the message is detected. The default behaviour in this situation is to silently discard the message.

When the FW determines that a detected message has ended, either succesfully or with an error, it reads the RSSI value from the CC1101 via the SPI interface. If it is trying to do this at the same time as your !F command this access could result in erroneous values for F being reported.

You can see if the FW is detecting many badly formed messages using the !T2 command. This will cause messages with detected errors to be printed.