espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.52k stars 7.26k forks source link

sdmmc_read_sectors_dma: sdmmc_send_cmd returned 0x107 (IDFGH-11882) #12968

Open yizi15 opened 9 months ago

yizi15 commented 9 months ago

Answers checklist.

IDF version.

release/v5.2

Espressif SoC revision.

esp32s3n8

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32-S3-USB-OTG

Power Supply used.

USB

What is the expected behavior?

通过fopen 打开文件,通过fread 读文件可以正常的读到文件的所有内容

What is the actual behavior?

在某些sd card(32g) 中 通过fopen 打开文件,通过fread 读文件经常读失败, 几秒内重新读取也会失败,写文件正常。 在另一种sdcard (16g)可以正常读写

Steps to reproduce.

将一个大于100M 的文件放入rf card,挂载sdmmc后 通过以下代码读文件 `sdmmc_host_t host = SDMMC_HOST_DEFAULT(); host.command_timeout_ms = 3000; // host.max_freq_khz /= 2; // host.flags &= ~SDMMC_HOST_FLAG_DDR; // This initializes the slot without card detect (CD) and write protect (WP) signals. // Modify slot_config.gpio_cd and slot_config.gpio_wp if your board has these signals. sdmmc_slot_config_t slot_config = SDMMC_SLOT_CONFIG_DEFAULT();

// Set bus width to use:
slot_config.width = 4;
// On chips where the GPIOs used for SD card can be configured, set them in
// the slot_config structure:
slot_config.clk = CONFIG_EXAMPLE_PIN_CLK;
slot_config.cmd = CONFIG_EXAMPLE_PIN_CMD;
slot_config.d0 = CONFIG_EXAMPLE_PIN_D0;
slot_config.d1 = CONFIG_EXAMPLE_PIN_D1;
slot_config.d2 = CONFIG_EXAMPLE_PIN_D2;
slot_config.d3 = CONFIG_EXAMPLE_PIN_D3;

// Enable internal pullups on enabled pins. The internal pullups
// are insufficient however, please make sure 10k external pullups are
// connected on the bus. This is for debug / example purpose only.
slot_config.flags |= SDMMC_SLOT_FLAG_INTERNAL_PULLUP;

ESP_LOGI(TAG, "Mounting filesystem");
ret = esp_vfs_fat_sdmmc_mount(mount_point, &host, &slot_config, &mount_config, &card);

uint8_t buf[512]; const char dst_path = "/sdcard/tst.txt"; size_t all_len =0; struct stat statbuf = {}; stat(dst_path, &statbuf); all_len = statbuf.st_size; FILE f = fopen(dst_path, "rb"); if(f == NULL) { ESP_LOGI("", "tst err"); } else { void p = buf;//malloc(5121024); size_t need_len = all_len; while (need_len > 0) { int32_t read_len = 0; for(int i = 0 ; i < 15; i ++) { read_len = fread(p, 1, 512, f); if(read_len > 0) { break; } ESP_LOGI("", "read fail"); sleep_ms(30); } if(read_len <= 0) { break; } need_len -= read_len; } if(need_len != 0) { ESP_LOGE("","all_len != self->file_size %d, %d", all_len, need_len); } else { ESP_LOGI("","read all data %d", all_len); } fclose(f); }`

Debug Logs.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x178c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xcbc
load:0x403cc700,len:0x2d90
entry 0x403c9914
I (27) boot: ESP-IDF v5.2-beta1-263-ge49823f10c-dirt 2nd stage bootloader
I (27) boot: compile time Jan 12 2024 16:06:39
I (28) boot: Multicore bootloader
I (32) boot: chip revision: v0.1
I (36) boot.esp32s3: Boot SPI Speed : 80MHz
I (41) boot.esp32s3: SPI Mode       : DIO
I (45) boot.esp32s3: SPI Flash Size : 8MB
I (50) boot: Enabling RNG early entropy source...
I (56) boot: Partition Table:
I (59) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (74) boot:  1 coredump         Unknown data     01 03 0000f000 00010000
I (81) boot:  2 otadata          OTA data         01 00 0001f000 00002000
I (89) boot:  3 phy_init         RF data          01 01 00021000 00001000
I (96) boot:  4 factory          factory app      00 00 00030000 00200000
I (104) boot:  5 ota_0            OTA app          00 10 00230000 00200000
I (111) boot:  6 ota_1            OTA app          00 11 00430000 00200000
I (119) boot: End of partition table
I (123) boot: Defaulting to factory image
I (128) esp_image: segment 0: paddr=00030020 vaddr=3c030020 size=10218h ( 66072) map
I (148) esp_image: segment 1: paddr=00040240 vaddr=3fc96300 size=06528h ( 25896) load
I (154) esp_image: segment 2: paddr=00046770 vaddr=40378000 size=098a8h ( 39080) load
I (163) esp_image: segment 3: paddr=00050020 vaddr=42000020 size=2c5ach (181676) map
I (196) esp_image: segment 4: paddr=0007c5d4 vaddr=403818a8 size=049c0h ( 18880) load
I (207) boot: Loaded app from partition at offset 0x30000
I (207) boot: Disabling RNG early entropy source...
I (218) cpu_start: Multicore app
I (228) cpu_start: Pro cpu start user code
I (228) cpu_start: cpu freq: 240000000 Hz
I (228) cpu_start: Application information:
I (231) cpu_start: Project name:     ESP32_EXAMPLE
I (236) cpu_start: App version:      2600230616000a000000FFFFFFFF
I (243) cpu_start: ELF file SHA256:  b67272f6c4123c36...
I (249) cpu_start: ESP-IDF:          v5.2-beta1-263-ge49823f10c-dirt
I (256) cpu_start: Min chip rev:     v0.0
I (261) cpu_start: Max chip rev:     v0.99 
I (266) cpu_start: Chip rev:         v0.1
I (270) heap_init: Initializing. RAM available for dynamic allocation:
I (277) heap_init: At 3FC9D5C8 len 0004C148 (304 KiB): RAM
I (284) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (290) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (296) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM
I (303) spi_flash: detected chip: gd
I (306) spi_flash: flash io: dio
I (310) sleep: Configure to isolate all GPIO pins in sleep state
I (317) sleep: Enable automatic switching of GPIO sleep configuration
I (324) esp_core_dump_flash: Init core dump to flash
I (330) esp_core_dump_flash: Found partition 'coredump' @ f000 65536 bytes
I (338) main_task: Started on CPU0
I (342) main_task: Calling app_main()
I (362) example: Initializing SD card
I (362) example: Using SDMMC peripheral
I (363) example: Mounting filesystem
I (365) gpio: GPIO[36]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (374) gpio: GPIO[35]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (383) gpio: GPIO[37]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (392) gpio: GPIO[38]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (402) gpio: GPIO[33]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (411) gpio: GPIO[34]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (467) gpio: GPIO[34]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (470) example: Filesystem mounted
E (15778) sdmmc_cmd: sdmmc_read_sectors_dma: sdmmc_send_cmd returned 0x107
E (15778) diskio_sdmmc: sdmmc_read_blocks failed (263)
I (15780) : read fail
I (15813) : read fail
I (15843) : read fail
I (15873) : read fail
I (15903) : read fail
I (15933) : read fail
I (15963) : read fail
I (15993) : read fail
I (16023) : read fail
I (16053) : read fail
I (16083) : read fail
I (16113) : read fail
I (16143) : read fail
I (16173) : read fail
I (16203) : read fail
E (16233) : all_len != self->file_size 123443470, 100476686
I (16233) main_task: Returned from app_main()

More Information.

sdcard_readerr.zip

igrr commented 9 months ago

Please try changing this line: https://github.com/espressif/esp-idf/blob/e49823f10c6b6b8dadbad376002ea2bc531b7b0f/components/driver/sdmmc/sdmmc_host.c#L275 to a larger value, e.g. 500 ms.

We've seen some SD cards which are not compliant with the SD Specification, and do not reply to data transfer commands in 100ms. Perhaps that's what is happening in your case.

yizi15 commented 9 months ago

Please try changing this line:

https://github.com/espressif/esp-idf/blob/e49823f10c6b6b8dadbad376002ea2bc531b7b0f/components/driver/sdmmc/sdmmc_host.c#L275

to a larger value, e.g. 500 ms. We've seen some SD cards which are not compliant with the SD Specification, and do not reply to data transfer commands in 100ms. Perhaps that's what is happening in your case.

Thanks, modify the timeout to 500 can solve the problem. I will close the issues after more test.

igrr commented 9 months ago

Let's keep this open... Since more people are running into this, I guess it makes sense to increase the timeout in IDF by default, or to make it somehow configurable. The issue will be closed automatically when we merge the fix.

jliu83 commented 8 months ago

Hello, we have just tested this.

We have an application where we are heavily dependent on data + logging on the sd card. The device would run fine most of the time, and occasionally we would get a sdmmc_read_sectors_dma: sdmmc_send_cmd returned 0x107. These errors are annoying because the tasks inside the framework would lock up, and there was no way to detect that this error has occurred. We used a watchdog timer to detect and reboot the system. This method sort of works, unless a sector of the files that are read on boot up triggers a timeout error code.

We kept these SD cards since we started seeing the problem, as they allow us to reproduce this error. While we scoured the forums for a solution and tried several of the proposed solutions, nothing worked, until I read this post.

Changing the timeout from 100 to 1000ms allows all of the old cards to boot up with no issue. I an convinced that the default 100ms is on the threshold for a lot of these cards, and the strict time keeping of the host driver is causing a lot of these cards to fail. The problem with the 0x107 error is that is crashes the entire SDMMC driver. It's okay to timeout, but allow the driver to continue functioning, don't just let it crash. This would make for a more robust driver.

I suggest two modifications to the driver. Allow the data command timeout to be set to a user value (something other than the default of 100) on initialization/mount. Also, when the driver encounters a timeout, do not lock up the driver, allow the driver to recover and retry, and continue functioning.

Thank you for the hint to resolving this. This post has given me a lot of relief and saved me a bunch of headaches.

igrr commented 8 months ago

but allow the driver to continue functioning, don't just let it crash.

@jliu83 Could you please describe the behavior you are observing in more detail? It is not expected that the driver should crash after one of the commands has timed out. If you have the details such as: IDF version, log output, panic output including the backtrace, please open a new issue and put the details there. We'll try to fix the issue and make the driver more robust.

jliu83 commented 8 months ago

This might be difficult to do, the SD cards that had the issues all work now, and the blocking file that was causing the error has been overwritten once a successful boot has occurred.

What I suspect is happening is the following.

sdmmc_read_sectors_dma() is called by sdmmc_read_sectors() is called by ff_sdmmc_read(). Then error 0x107 (timeout) is returned from sdmmc_read_sectors_dma() it is propagated all the way up to ff_sdmmc_read(). At this point many of the tasks in the system is halted, or are blocked. We didn't investigate too much at the time. Unfortunately I cannot replicate the error anymore and get the backtrace. If we are able to reproduce I will file a bug right away.