earlephilhower / arduino-pico

Raspberry Pi Pico Arduino core, for all RP2040 and RP2350 boards
GNU Lesser General Public License v2.1
2.06k stars 431 forks source link

Help Test SDIO for RP2040/RP2350 #2562

Open greiman opened 2 weeks ago

greiman commented 2 weeks ago

Please help test this beta version of SdFat that supports fast SDIO.

A number of users have requested this feature and hope it will be include in this package for RP2040/RP2350.

Here are some of my test results for the SdFat bench example with a Lexar Silver Plus card:

Pico 2 512 byte transfers at 150 MHz

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 15526.96,38,32,32 15478.89,39,32,32

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 15526.96,449,32,32 15526.96,452,32,32

Pico 2 large transfers at 250 MHz

FILE_SIZE_MB = 100 BUF_SIZE = 32768 bytes

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 27263.48,11293,1197,1201 27256.04,11098,1197,1201

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 27013.02,2400,1210,1212 26788.63,9307,1210,1222

Pico 2 small 64 byte transfers 150 MHz:

FILE_SIZE_MB = 100 BUF_SIZE = 64 bytes

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 9980.04,11721,2,6 9984.03,8736,2,6

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 10145.07,75,1,5 10145.07,76,1,5

earlephilhower commented 1 week ago

Very nice!

Using my just made SD->Pico (RP2040 @ stock 133MHZ) adapter on an old SanDisk Extreme 32GB "U3" "V30" card I'm getting ~13MB/s

Pinout and clocks

#define SPI_CLOCK SD_SCK_MHZ(50)
#define RP_CLK_GPIO 14
#define RP_CMD_GPIO 15
#define RP_DAT0_GPIO 18 

Results

Type any character to start
FreeStack: 253848
Type is FAT32
Card size: 31.91 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SE32G
Revision: 8.0
Serial number: 0X7B761DA4
Manufacturing date: 8/2016

FILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
13227.51,805,37,37
13333.33,781,37,37

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
12658.23,62,39,39
12658.23,69,39,39

Done

Adapter

image

I couldn't find a way of low-level erasing the card beforehand (like secure erase for SSDs) under Linux, so assume these #s are on a card that's been beaten to death in an old cell phone.

The stack free function needs a bit of work, and there look to be some functions in the SDIO routine with >500 bytes of stack needed (which is not an error but might cause weirdness in real apps with other stack users in the call chain).

Is there something specific you wanted to try out? My real high perf cards for my DSLRs are all full-size, so I can't use this adapter. But 13MB/s seems like a pretty good # even so on a random old one...

earlephilhower commented 1 week ago

There might be some CPU limitation, it seems. I bumped to 200MHZ on the same card and got the following

FreeStack: 253848
Type is FAT32
Card size: 31.91 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SE32G
Revision: 8.0
Serial number: 0X7B761DA4
Manufacturing date: 8/2016

FILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
19920.32,1325,24,25
19920.32,1338,24,25

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
19011.41,50,26,26
19011.41,39,26,26

Done
greiman commented 1 week ago

I couldn't find a way of low-level erasing the card beforehand

Use the SdFormatter example. The erase option quickly low level erases an SD.

The stack free function needs a bit of work, and there look to be some functions in the SDIO routine with >500 bytes of stack needed.

I may remove the free stack function from SdFat examples since it is over 10 years old and was for 328 boards.

Buffering for alignment problem is a problem. I have avoided allocating dynamic memory and don't like using the stack. Any suggestions?

But 13MB/s seems like a pretty good # even so on a random old one...

I suspect few apps need more.

earlephilhower commented 1 week ago

+1 for minimizing dynamic memory allocation! It's not nearly as important on the 2040(256K) or 2350 (512K!!!) as the 8266 or AVRs, but every little bit helps avoid memory fragmentation.

Not sure what you mean by buffer alignment, but using __attribute__((aligned(4))) (or whatever) should work. Stack variables should already be 4-byte aligned if I understand the ARM ABI properly.

We also have a HW DMA engine that's 2x faster for large blocks than memcpy. It only works for 4-byte aligned offset, 4-byte aligned length, though. But it's a drop-in-replacement of memcpy with rp2040.memcpyDMA (and falls back to the ROM memcpy when it can't handle things). For smallish copies (32-bytes) it's about even with ROM memcpy, so if you're moving small blocks this won't do much.

--edit-- A quick sprinkling of rp2040.memcpyDMA in the spots where whole sectors were being copied didn't move the needle. So, no simple speed up there. :(

earlephilhower commented 1 week ago

SdFormatter didn't seem to change the resumts on the SanDisk card so it was probably still relatively clean. I did get a different on a generic MicroCenter-branded "U10" card, whose results follow. I suppose the 12-13MB/s read is due to a bottleneck somewhere in the MCU since it's the same as the "good" card, while the write limitation is down to the very cheap card. In any case, very consistent success using the SDIO mode even with spaghetti wiring!

Type is FAT32
Card size: 15.59 GB (GB = 1E9 bytes)

Manufacturer ID: 0X27
OEM ID: PH
Product: SD16G
Revision: 6.0
Serial number: 0XDA603B0C
Manufacturing date: 2/2019

FILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
8503.40,134410,37,58
8695.65,134520,37,58

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
12406.95,69,40,40
12406.95,54,40,40

Done
greiman commented 1 week ago

Not sure what you mean by buffer alignment

Read/write calls often occur with non aligned buffers. Also if the file positioned is not a multiple of four bytes the copies to form complete sectors will not be aligned.

The bench example with 512 byte transfers will never need the copies. Try bench with 513 byte transfers to insure no crashes due to alignment problems.

Performance suffers with lots of nonaligned copies.

Here is RP2040 with 512 byte transfers at 133 MHz using a low cost PNY 32GB microSD.

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 12658.23,421,37,39 12658.23,416,37,39

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 12626.26,68,39,39 12626.26,68,39,39

Here is the result with 513 byte transfers:

FILE_SIZE_MB = 5 BUF_SIZE = 513 bytes

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 8103.73,1068,38,62 8130.08,459,38,62

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 8064.52,96,44,62 8077.54,99,44,62

I suppose the 12-13MB/s read is due to a bottleneck somewhere in the MCU

Modern SD cards have huge flash pages, 32KB or more. Low end SD cards don't pipeline reads as well as high end cards.

High end cards have lots of buffering and they read ahead and pipeline the data stream for sequential reads. High end SD cards are incredibly complex, some even use Pseudo-SLC cache like SSD drives.

I was amazed to see high end SD cards setup read steam buffer policy based on how a file was written. SD cards expect a standard file format as specified by the SD Association. The SD expects standard locations and sizes for clusters and other file structures.

FAT areas are managed in a different way than data areas.

Some SD cards try to optimize for multiple open files or random I/O.

For best results use the Official SD Association formatter.

earlephilhower commented 1 week ago

Gotcha. Misaligned accesses (byte-wise? ugh!) are always brutal, anyway. The DMA copy won't help you there in most cases, sadly.

Your examples have several boards w/SDIO pins defined manually I can add those defines to the board variant headers and it will "just work" without you needing to manually include the values for every example.

In any case, is there a timeline for beta->release on the new SDFAT? My fork had minimal changes to work with our File and other minutiae, so I may need to pull there and not directly from your release. But, I'd need a release to start work. :)

greiman commented 1 week ago

Your examples have several boards w/SDIO pins defined manually I can add those defines to the board variant headers and it will "just work" without you needing to manually include the values for every example.

About the only variant that is safe is AdaFruit Metro RP2040, it has an onboard SDIO/SPI socket. Other users of the beta select different pins than my test cases.

The current beta has lots of changes for other boards. I have done tests with some of the most popular Arduino and AdaFruit boards. There are now thousands of "Arduino Compatible" boards plus custom boards that use SdFat so I can no longer test a fraction of these boards.

If I don't get any serious issues, I will post a release on SdFat in about a week.