almusil / rfm69

A generic rust driver to support RFM69 family wireless chips.
Apache License 2.0
9 stars 9 forks source link

SPI Timing issues with RFM69HCW #36

Closed simonmicro closed 9 months ago

simonmicro commented 1 year ago

Hi.

I've recently tried to use this library with a RFM69HCW. Sadly, the communication just did not work and even the read_all_regs call resulted in all registers having the same values (sometimes depending on the SPI speed setting). Also, manually write and read to the registers wouldn't work, as their values always stayed "the same".

~6 hours later ~

There is a mistake inside here:

This library performs separate SPI operations to first write the operation + register-index and then the padding bytes to transfer the data back from the chip (or in case of write the data is just being written). Why is this fatal? Let's have a look into the datasheet, page 44...

image

The chip expects the CS to be low during the data-transfer (which is implemented correctly), but will instantly answer with the requested data back to the host - our sample chip does not even wait for the padding byte of the master. In case of the two separate write calls, this just leads to rubbish being written.

Solution?

A proof of concept to fix the single-byte communication can be found here: https://github.com/simonmicro/rfm69/tree/bug/rfm69hcwSerialCommunication. It allows to set and get a register, but will still fail on the read_all_regs call. A janky solution to fix the multi-byte communication can be found here https://github.com/simonmicro/rfm69/tree/bug/rfm69hcwSerialCommunicationWithStd - but as I'm no Rust-pro I was forced to remove the nostd limitation and to use a buffer-copy with allocation to get it working.

I hope this can be merged / ported to properly fix this, including the nostd-requirement.

Greetings!

Also I would recommend implementing something like this: https://github.com/jgillula/rpi-rfm69/blob/44ed86d9e70e7d1010d41ce6386c709d9d7d1689/RFM69/radio.py#L117. This will allow developers to more quickly discover problems with their SPI setup, as well as properly syncing with the chip.

almusil commented 1 year ago

Hi.

Hi,

thank you for reporting this issue however I'm confused by this report please see comments below.

I've recently tried to use this library with a RFM69HCW. Sadly, the communication just did not work and even the read_all_regs call resulted in all registers having the same values (sometimes depending on the SPI speed setting). Also, manually write and read to the registers wouldn't work, as their values always stayed "the same".

I don't have any HCW to test it out, however the library is working for me for a couple of years now with RFM69W variant. There doesn't seem to be a difference between those two when it comes to SPI communication.

~6 hours later ~

There is a mistake inside here:

* https://github.com/almusil/rfm69/blob/cfc53c9422232adb60878479a742c53d2ab08e21/src/rw.rs#L29

* https://github.com/almusil/rfm69/blob/cfc53c9422232adb60878479a742c53d2ab08e21/src/rw.rs#L35

* https://github.com/almusil/rfm69/blob/cfc53c9422232adb60878479a742c53d2ab08e21/src/rw.rs#L48

* https://github.com/almusil/rfm69/blob/cfc53c9422232adb60878479a742c53d2ab08e21/src/rw.rs#L54

This library performs separate SPI operations to first write the operation + register-index and then the padding bytes to transfer the data back from the chip (or in case of write the data is just being written). Why is this fatal? Let's have a look into the datasheet, page 44...

image

The chip expects the CS to be low during the data-transfer (which is implemented correctly), but will instantly answer with the requested data back to the host - our sample chip does not even wait for the padding byte of the master. In case of the two separate write calls, this just leads to rubbish being written.

According to the datasheet:

SINGLE access: an address byte followed by a data byte is sent for a write access whereas an address byte is sent and
a read byte is received for the read access. The NSS pin goes low at the begin of the frame and goes high after the data
byte.
 BURST access: the address byte is followed by several data bytes. The address is automatically incremented internally
between each data byte. This mode is available for both read and write accesses. The NSS pin goes low at the
beginning of the frame and stay low between each byte. It goes high only after the last byte transfer.

It doesn't specify what the transaction should be, and the timing diagram basically states the same. It shouldn't matter if we first write the address and then read/write in two calls or a single one. Looking at the diagram it in fact waits for MOSI to have the data for write or just empty byte for read.

Solution?

A proof of concept to fix the single-byte communication can be found here: https://github.com/simonmicro/rfm69/tree/bug/rfm69hcwSerialCommunication. It allows to set and get a register, but will still fail on the read_all_regs call. A janky solution to fix the multi-byte communication can be found here https://github.com/simonmicro/rfm69/tree/bug/rfm69hcwSerialCommunicationWithStd - but as I'm no Rust-pro I was forced to remove the nostd limitation and to use a buffer-copy with allocation to get it working.

I hope this can be merged / ported to properly fix this, including the nostd-requirement.

The solution does virtually the same thing as it was before. Which makes me wonder if you didn't use HW CS management? In that case it might lead to problems if it's done as "multiple" transactions. Which platform did you try it with? Was it RPI?

I would suggested to check it with logic analyzer (if you have one) to see if the CS indeed stays low for the whole read/write, if not then that's probably the answer why.

Greetings!

Also I would recommend implementing something like this: https://github.com/jgillula/rpi-rfm69/blob/44ed86d9e70e7d1010d41ce6386c709d9d7d1689/RFM69/radio.py#L117. This will allow developers to more quickly discover problems with their SPI setup, as well as properly syncing with the chip.

This can be done very easily and doesn't have to be part of this crate. It just pulling GPIO low for a while. Adding this is bit problematic for backward compatibility as it will for sure change the signature struct itself.

simonmicro commented 1 year ago

Hi, thanks for your extensive report back - so I'll give my best to shed some light on my diagnosis and what I mean.

CS Timing Issues

So, here is my code: https://gist.github.com/simonmicro/ac62e25dd1581d56e0eb7b596d8155ed - it should display the issue at hand. I've also attached the output, so you can see my issue. This code was written and tested on a RPI 3B(+) (multiple models) with the RFM69HCW (multiple models). I've cross-validated this solution against the Python library, which is also only using single transactions, although maybe more by accident as I did not check their history.

I would suggested to check it with logic analyzer (if you have one) to see if the CS indeed stays low for the whole read/write, if not then that's probably the answer why.

Crap. I do not have one. If really needed, I could try to organize an Oscilloscope and dive into that - but this will take some time.

Debug Extension

This can be done very easily and doesn't have to be part of this crate. It just pulling GPIO low for a while. Adding this is bit problematic for backward compatibility as it will for sure change the signature struct itself.

Yes and no. You are right, the chip reset using GPIO is probably not well suited for this crate (maybe add this snippet into the examples instead?). I'm more looking into their sync-word sync. During that stage they try to set and read two registers of the module (the sync-word ones) and try to establish a communication with this. I think this would make a great addition to this crate - and to be backward-compatible, it should be an own function (like test_module_connection) and of course also included into the examples.

I'm looking forward to your response, ~Simon

P.S. Great work, after solving my timing issues I started to explore more and more of the exposed features of this crate and overall: Good Job!

almusil commented 1 year ago

Hi, thanks for your extensive report back - so I'll give my best to shed some light on my diagnosis and what I mean.

Hi, thank you for more details.

CS Timing Issues

So, here is my code: https://gist.github.com/simonmicro/ac62e25dd1581d56e0eb7b596d8155ed - it should display the issue at hand. I've also attached the output, so you can see my issue. This code was written and tested on a RPI 3B(+) (multiple models) with the RFM69HCW (multiple models). I've cross-validated this solution against the Python library, which is also only using single transactions, although maybe more by accident as I did not check their history.

I think I see what might be the problem. Please notice the difference in the crate example:

    // Configure CS pin
    let cs = SysfsPin::new(25);
    cs.export()?;
    cs.set_direction(Direction::High)?;

and your code for the CS pin:

    // Configure CS pin
    let cs = SysfsPin::new(24); // gpio 8, pin 24
    cs.export()?;
    cs.set_direction(Direction::High)?;

I'm using GPIO 25 on purpose, because that's not the hw CS pin. Another thing to note is that GPIO 8 doesn't correspond to pin 24 (that's GPIO 24), which I think is just misconfiguration. Looking through the Spidev they do not disable hw CS and that acts according to transaction, that would explain why multiple reads/writes end up with some garbage, because the CS is set high after every transaction, thus single one works.

Would you mind trying out different pin that's also not related to the SPI0? The GPIO 25 (even the GPIO 24) should work just fine.

I would suggested to check it with logic analyzer (if you have one) to see if the CS indeed stays low for the whole read/write, if not then that's probably the answer why.

Crap. I do not have one. If really needed, I could try to organize an Oscilloscope and dive into that - but this will take some time.

If my theory is right it won't be needed probably.

Debug Extension

This can be done very easily and doesn't have to be part of this crate. It just pulling GPIO low for a while. Adding this is bit problematic for backward compatibility as it will for sure change the signature struct itself.

Yes and no. You are right, the chip reset using GPIO is probably not well suited for this crate (maybe add this snippet into the examples instead?). I'm more looking into their sync-word sync. During that stage they try to set and read two registers of the module (the sync-word ones) and try to establish a communication with this. I think this would make a great addition to this crate - and to be backward-compatible, it should be an own function (like test_module_connection) and of course also included into the examples.

Having additional function is completely fine, it can be added to the example no problem with extra pin to show how to handle resets.

I'm looking forward to your response, ~Simon

P.S. Great work, after solving my timing issues I started to explore more and more of the exposed features of this crate and overall: Good Job!

Thank you.

simonmicro commented 9 months ago

Hi, it took me a while to find time to re-test this, I'm sorry. But I have the good news - it works out-of-the-box. I simply made the mistake of using the hardware-controlled-pin as I had some pin-constraints on the dependent project. Therefore, this issue can be closed. :blush:

Regarding the additional functionality: I'm too bad at Rust with no_std, so I won't write the aforementioned enhancement myself (because I would like to use the systems time to not get stuck inside such "synchronize"-method). So, if anyone stumbles upon this - go ahead please :wink:

Thank you for your patience and extensive replies. :wave: