adafruit / Adafruit_SPIFlash

Arduino library for external (Q)SPI flash device
MIT License
159 stars 84 forks source link

SAMD SPI Read issue #65

Closed hathach closed 4 years ago

hathach commented 4 years ago

Describe the bug When improving spi flash accessing speed, I find out that M0 can perform SPI DMA write without issue, but can only carry out exactly 1 DMA Read (still has issue).

Code that enable DMA write https://github.com/adafruit/Adafruit_SPIFlash/blob/master/src/spi/Adafruit_FlashTransport_SPI.cpp#L147

Code that eanble DMA read https://github.com/adafruit/Adafruit_SPIFlash/blob/master/src/spi/Adafruit_FlashTransport_SPI.cpp#L121

Set up (mandatory)

To Reproduce Steps to reproduce the behavior:

  1. Uncomment this #elif to enable SPI DMA read https://github.com/adafruit/Adafruit_SPIFlash/blob/master/src/spi/Adafruit_FlashTransport_SPI.cpp#L119 Code should be
    #elif defined(ARDUINO_ARCH_SAMD) && defined(_ADAFRUIT_ZERODMA_H_)
    // TODO Could only got the 1st SPI read work, 2nd will failed, maybe we
    // didn't clear thing !!!
    _spi->transfer(NULL, data, len, true);
  2. Run the flash_speedtest example
  3. See error

Context the example sketch will

Then it does the second pass with 0x55 pattern. The sketch complete successfully when DMA Read is not used, and failed when it does. In both cases DMA Write is eanbled. Below is the output log.

By my observation, the DMA Read seems to have an error, it does read several hundreds byte before failing. Afterwards, it doesn't seem to be able to execute another Read ( second pass is all zeroes).

@PaintYourDragon @ladyada Since you are expert on SPI DMA on SAMD, would you mind taking a look at the issue. It is probably some condition is not clear, or maybe I didn't understand enough of the DMA API. Thanks in advance.

Serial Log

With DMA Read eanbled

Adafruit Serial Flash Speed Test example
JEDEC ID: C84015
Flash size: 2097152
Erase chip
Write flash with 0xAA
Write 2097152 bytes in 4.63 seconds.
Speed : 452.95 KB/s.

Read flash and compare
Error: flash contents mismatched at address 0x00000000!!!
000: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
010: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
020: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
030: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
040: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
050: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
060: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
070: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
080: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
090: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
0A0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
0B0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
.......................
320: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
330: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
340: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
350: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
360: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
370: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
380: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
390: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3A0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3B0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3C0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3D0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3E0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
3F0: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
400: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA 
410: AA AA AA AA AA 00 00 00 00 00 00 00 00 00 00 00 
420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
.......................
FE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
FF0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Erase chip
Write flash with 0x55
Write 2097152 bytes in 11.45 seconds.
Speed : 183.22 KB/s.

Read flash and compare
Error: flash contents mismatched at address 0x00000000!!!
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
.......................
FE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
FF0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Speed test is completed.

Without DMA Read

Adafruit Serial Flash Speed Test example
JEDEC ID: C84015
Flash size: 2097152
Erase chip
Write flash with 0xAA
Write 2097152 bytes in 4.64 seconds.
Speed : 452.07 KB/s.

Read flash and compare
Read  2097152 bytes in 3.54 seconds.
Speed : 592.08 KB/s.

Erase chip
Write flash with 0x55
Write 2097152 bytes in 4.64 seconds.
Speed : 452.46 KB/s.

Read flash and compare
Read  2097152 bytes in 3.54 seconds.
Speed : 592.42 KB/s.

Speed test is completed.
PaintYourDragon commented 4 years ago

Been spending some time with this today and have not found the cause yet. I did find and fix an obscure bug in the SPI library (in ArduinoCore-samd) but that has no impact here. Need to set this aside for the time being due to other projects.

hathach commented 4 years ago

Thanks @PaintYourDragon , whenever you got the time. Let's me know if you need my help with testing or other part of code with SPIFlash :)

PS: I have no idea, but may this SPI DMA related PR to SAMD core could help https://github.com/adafruit/ArduinoCore-samd/pull/234

PaintYourDragon commented 4 years ago

Think it’s fixed. Change pushed to ArduinoCore-samd main branch (changes are in SPI library). Tested on Feather M0 and M4, including both default settings and overclock + dragon optimization on M4…plus Grand Central running OV7670 cameratest (to TFT shield).

Grateful for other eyes to test this, especially with other code that uses SPI DMA (GFX, SD cards, etc.). Holding off on closing the issue until others can confirm it’s solid.

BONUS: large SPI transfers can be fully non-blocking now. Previously large transfers were “mostly-blocking.” This will help things like the camera and possibly NES emulator, anything that writes full screens worth of data.

Bug was not where expected. It only manifested in flash_speedtest because of the particular order in which it called SPI.transfer() with writes followed by reads, apparently a rare sequence of events. Should handle everything really well now.

hathach commented 4 years ago

@PaintYourDragon superb !!! I have just run the test and verified that the issue is fixed with latest samd core. Read speed is almost got doubled using DMA.

without DMA (speed in KB/s)

- M0 FRAM: read (noDMA, 12Mhz) = 490
- M0 GD25Q16C: read (noDMA, 24Mhz) = 570
- M4 FRAM: read (noDMA, 24Mhz) = 1300

with DMA (speed in KB/s)

- M0 FRAM: read (DMA, 12Mhz) = 1200
- M0 GD25Q16C: read (DMA, 24Mhz) = 1200
- M4 FRAM: read (DMA, 24Mhz) = 2950

I really like the enhancement with non-blocking, this can be used to improve the SPIFlash speed even more. The read won't get much usage since most of the time the application want to use the read data immediately after the call. For WRITE, SPIFlash need to have some guard to prevent application to overwrite to the same caching buffer (before the previous DMA complete). I will try to look at this later on if user complains about the performance, since this may need quite a bit of testing since it involve racing condition.

PaintYourDragon commented 4 years ago

SPI.waitForTransfer() will wait for a non-blocking transfer to complete…call this before ANY other action on the same SPI bus. Returns immediately if transfer is already finished.

hathach commented 4 years ago

SPI.waitForTransfer() will wait for a non-blocking transfer to complete…call this before ANY other action on the same SPI bus. Returns immediately if transfer is already finished.

Thank for the tip, will try it out later on. Kind of satisfied by the result now and get bored with testing with SPIFlash :stuck_out_tongue: :stuck_out_tongue:

hathach commented 4 years ago

it is solid enough to be closed now.