foldedtoad / dwm3000

Port of Qorvo/Decawave's DWM3000 Module on the DWS3000 Arduino Shield
GNU General Public License v3.0
83 stars 16 forks source link

ex_03d_tx_wait_resp_interrupts receives response with corrupted data. #2

Open foldedtoad opened 3 years ago

foldedtoad commented 3 years ago

When running the project pair ex_03a_tx_wait_resp and ex_03b_rx_send_resp, the response data is received by tx-side as expected, e.g. no data corruption of frame. In other words, this pair of programs run expected.

But when running ex_03d_tx_wait_resp_interrupt and ex_03b_rx_send_resp, the response data received by the tx-side has been partially zero-ed.

The rx-side received the tx-side message and uses it to build the response data, which is shown below.

[00:00:07.574,676] <inf> rx_send_resp: resp len 21
                                       41 8c 00 9a 60 44 45 43 41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                       00 00 00 00 00                                  |.....

While the tx-side receives the response data, the data (frame) has a section zero-ed.

[00:12:51.581,726] <inf> tx_wait_resp_int: OK: len: 21
[00:12:51.581,756] <inf> tx_wait_resp_int: rx resp
                                          41 8c 00 9a 60 44 45 00 00 00 00 00 00 00 57 10 |A...`DE. ......W.
                                          00 00 00 c3 2b                                  |....+

Note that the tx-side receives interrupts without problems, it's just the data that is partially zero-ed.

Status: Currently investigating. Need to get detailed info (DWM3000 datasheet) from Qorvo/Decawave. Appears NDA is needed to receive details.

krbvroc1 commented 3 years ago

@foldedtoad I ran into this issue also ... it is very confusing. A few notes from my debugging....

It happens when the dwt_readrxdata() is called inside the rx_ok_cb ISR and when the data to read exceeds some length. I noticed it only happened to the last few bytes of the RX_FINAL message which is the longest. So maybe if length > 18.

It appears to be a Zephyr issue. I hooked up a logic analyzer to the shield and confirmed that SPI transaction is correct. However, it is not what spi_transeive() in your 'readfromspi.c' returns! If you look at memory with the Ozone, you will see that some time AFTER the ISR exits, the correct bytes show up.

My conclusion is that somehow the SPI transaction is not fully completed when the ISR is invoked. At this point I have no clue how to fix. My temporary 'hack' was to notify my app from the ISR and call dwt_readrxdata() in the app.

Hopefully this gives some you some info and maybe we can figure this out?

ps. I believe DW3000 IC 'User Manual' is available on their webite and does not need NDA.

foldedtoad commented 3 years ago

I recently received a Nucleo-F429ZI board and have made good progress on getting it to work with this project's code. (BTW, please git-pull this new code.) Anyway I just tested it with this new board and it appears to work as it should: see log below from Nucleo-F429ZI side (tx_wait_resp_interrupt).

[00:00:30.979,000] <inf> tx_wait_resp_int: OK: len: 21
[00:00:30.979,000] <inf> tx_wait_resp_int: rx resp
                                        41 8c 17 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                        00 00 00 65 68                                   |...eh    

This interesting aspect is that the 'hole' is consistently in the middle of the data at same offset (for nRF52840). I will try the nRF52832, and report back.

Yes, I have a copy of the DW3000 User manual. I printed/bound it as I like to make notes in a hardcopy...just a thing I have. :-)

Thanks for passing your insights on to me.

foldedtoad commented 3 years ago

So here are my results in testing (same conditions) with the PCA10040 (nRF52832) build of this code.

[00:00:23.066,650] <inf> tx_wait_resp_int: rx resp
                                           41 8c 0a 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                           00 00 00 cf b0                                   |..... 

This would indicate this problem is limited to the nRF52840 build. The nRF52840 build used spi_nrfx_spim.c, while the nRF52832 uses spy_nrfx_spi.c.
If you look at the logic analyzer traces, you can see a noticeable difference in bus operation.
I don't know what the significant of this is (yet).

I will continue to delve into this issue.

krbvroc1 commented 3 years ago

One thing I noticed is that you have not defined the port routines, port_DisableEXT_IRQ, port_EnableEXT_IRQ, etc. These are used by the decamutexon() or decamutexoff() routines to mask interrupts. Since interrupts are not being masked, I wonder if this is causing some issues in certain cases. I am not sure how to do this under Zephyr since the gpio abstraction hides the IRQ and such.

I think the nrfx spim routines might use DMA.

I am using this driver code and have found the SPI to hang on the board dependent on SPI speed and other random hangs during longevity tests.

foldedtoad commented 3 years ago

I'll take a look at the two functions you mentioned. back in a bit.

foldedtoad commented 3 years ago

I've been looking into this off and on for a week now...here is what I have found.

  1. The Response is received intact by the TX side (ex_03d_tx_resp_interrupt), as can be seen in the logic analyzer trace below. spim_read_resp_data Here is the logic analyzer trace if you're interested: tx_wait_resp_interrupts.logicdata.zip So this would indicate that the issue is more software related.

  2. Checking the nRF52840 Errata, it appears there are several issues on the nRF52840's SPI3 controller. See errata items 174, 195 and 198. NOTE: The Arduino SPI config is mapped onto the nRF52840's SPI3 controller. The nrfx spi driver for Zephyr, spa_nrfx_spim.c, calls into the Nordic HAL (../zephyrproject/modules/hal/nordic/nrfx/drivers/src/nrfx_spim.c)

  3. In nrfx_spim.c in function spim_xfer(), there is several conditional workaround options.

    • NRFX_SPIM_FLAG_HOLD_XFER
    • NRFX_SPIM_NRF52_ANOMALY_109_WORKAROUND_ENABLED These may be related to the above errata and also this problem.
  4. I have found that I inject a short delay after the nrfx_spim_xfer() in spi_nrfx_spim.c (Zephyr code), the complete response data is received. This is NOT a fix, but rather investigation probing code.

In spi_nrfx_spim.c circa line 198

             if (!error) {
                   result = nrfx_spim_xfer(&dev_config->spim, &xfer, 0);
                    if (result == NRFX_SUCCESS) {
#if 1                 // diagnostics & investigation   Delay 100 iterations
                        if (xfer.rx_length > 10) {
                            for (volatile int i=0; i < 100; i++) { /* spin */}
                        }
#endif
                       return;
                   }
                  error = -EIO;
             }

The output looks like this --

*** Booting Zephyr OS build zephyr-v2.5.0-1675-gd6567ad494a0  ***

[00:00:08.222,320] <inf> main: main_thread
[00:00:08.222,351] <inf> port: Configure WAKEUP pin on port "GPIO_1" pin 11
[00:00:08.222,381] <inf> port: Configure RESET pin on port "GPIO_1" pin 8
[00:00:08.222,381] <inf> port: Configure RX LED pin on port "GPIO_1" pin 5
[00:00:08.222,381] <inf> port: Configure TX LED pin on port "GPIO_1" pin 4
[00:00:08.222,412] <inf> port: Configure SPI Phase pin on port "GPIO_1" pin 1
[00:00:08.222,412] <inf> port: Configure SPI Polarity pin on port "GPIO_1" pin 2
[00:00:08.222,412] <inf> deca_spi: openspi bus SPI_3
[00:00:09.273,376] <inf> tx_wait_resp_int: TX W4R IRQ v1.0
[00:00:09.273,406] <inf> port: reset_DWIC
[00:00:09.278,076] <inf> deca_device: dev_id "deca0302"
[00:00:09.281,463] <inf> port: Configure IRQ on port "GPIO_1" pin 10
[00:00:09.281,463] <inf> port: Configure IRQ pin
[00:00:09.282,806] <inf> tx_wait_resp_int: OK: len: 21
[00:00:09.282,806] <inf> tx_wait_resp_int: rx resp
                                           41 8c 0f 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                           00 00 00 fc e0                                   |.....            
[00:00:10.330,444] <inf> tx_wait_resp_int: OK: len: 21
[00:00:10.330,474] <inf> tx_wait_resp_int: rx resp
                                           41 8c 10 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                           00 00 00 48 18                                   |...H.            
[00:00:11.369,781] <inf> tx_wait_resp_int: OK: len: 21
[00:00:11.369,781] <inf> tx_wait_resp_int: rx resp
                                           41 8c 11 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                           00 00 00 47 08                                   |...G.            
[00:00:12.409,088] <inf> tx_wait_resp_int: OK: len: 21
[00:00:12.409,118] <inf> tx_wait_resp_int: rx resp
                                           41 8c 12 9a 60 44 45 43  41 57 41 56 45 44 57 10 |A...`DEC AWAVEDW.
                                           00 00 00 56 38                                   |...V8 
krbvroc1 commented 3 years ago

I do not have my board setup to test at the moment, but if you happen to have time, can you try enabling

zephyr_compile_definitions(NRFX_SPIM3_NRF52840_ANOMALY_198_WORKAROUND_ENABLED=1)

After some searching, I see that it is NOT enabled by default, yet it seems to impact current revision chips.

EDIT: I did try this on code I am using, not the ex_03d, and at least on my code it does not appear to fix anything.