espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.94k stars 7.1k forks source link

SD card in SPI mode: mounting after ESP32 restart can fail because CMD52 is issued with CRC check ON (IDFGH-13055) #14000

Open espressing opened 3 weeks ago

espressing commented 3 weeks ago

Answers checklist.

IDF version.

v5.2.1

Espressif SoC revision.

ESP32-D0WD rev3

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-WROVER-E-N16R8

Power Supply used.

USB

What is the expected behavior?

When ESP32 is restarted whilst transactions with SD card in SPI mode are ongoing (e.g. soft reboot or panic reset), the mounting sequence for the SD card finds it in a previously left state.

When power cycling entire system incl. SD card is not possible, the re-mounting of the card must succeed by getting card out of a garbled last transaction, doing IO reset to it, and ensuring SD card can again be mounted and used.

When SD card is power cycled together with ESP32, then card is in a standard-specified idle state - so a CMD52 is superfluous and it can be allowed to fail if e.g. card says not supported (it is specified as OPTIONAL class in the standard!).

The proposed change therefore has no effect in powered-up cards and solves the extremely detrimental outcome of not allowing a CRC error or such to occur in a restarted ESP32 when SD card has not been powered off and on again.

What is the actual behavior?

The SD card mounting after ESP32 restart (SD card and ESP32 not power cycled) fails in first command (CMD52) that is meant to perform a card IO reset.

The failure reasons are: - CRC error: this is aborting the mounting sequence in sdmmc_card_init(): it calls sdmmc_io_reset() to send CMD52, which fails with CRC error - this exits entire mounting attempt and card remains unmounted, ESP32 application cannot work with it any more. - not supported: this can happen when card is in a state that it returns this for the IO reset / CMD52.

In both cases: allowing the failure of CMD52 with CRC reset or 'not supported' can proceed successfully to all the rest of the steps and card mounts and works fine.

The SD card physical layer specification explicitly states that CRC must not be ON in SPI mode for anything other that CMD0 and CMD8. But in above case the card, only its ESP32 master having restarted, is already fully initialised and set up - so it has CRC turned ON by the mounting sequence in a previous ESP32 power cycle! and this fail is not allowed by mentioned function(s)

Therefore: either allow these codes and continue mounting operations (later steps may fail if things are truly bad with the card) or ensure that CRC is turned off. The latter cannot really be ensured because card could be in whatever state, with CRC already turned ON, and therefore some commands may well fail anyway with CRC checks.

The mentioned function is unforgiving - it aborts everything as soon as first ever (CMD52) fails in above way. Therefore, any system that cannot power cycle the card is severely impacted and in our case we lose access to card completely, needing human / customer interaction to reboot entire system (card is not directly accessible).

CRC error is almost guaranteed to occur in such case, because card still has internal leftovers that contribute to a (bad) CRC on this first command (as far as ESP32 is concerned after a soft reboot). Not allowing this and other failures means user completely loses the access to the SD card that otherwise is specified and proven to work fine with completed mounting sequence of steps.

NB there may be other steps that might, for some cards, harmlessly fail because card says it is already in that mode or not supporting it in already set up state - these should be allowed.

Steps to reproduce.

Any application that has data transfers ongoing when ESP32 e.g. panic resets or is told to soft restart. SD card, randomly in any such case, being stuck in whatever last transactions were ongoing, will fail at least CRC on the first ever (CMD52) command issued to it when ESP32 mounting API starts its card setup sequence.

Debug Logs.

When mount fails purely due to CMD52 response not being tolerated:

I (4876) sdio.c: Using SPI peripheral
I (4886) gpio: GPIO[13]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (4896) sdspi_transaction: cmd=52, R1 response: command CRC error
E (4896) sdmmc_io: sdmmc_io_reset: unexpected return: 0x109
I (4906) sdmmc_init: sdmmc_card_init: sdmmc_io_reset returned 0x109
E (4916) vfs_fat_sdmmc: sdmmc_card_init failed (0x109).
HINT: Please verify if there is an SD card inserted into the SD slot. Then, try rebooting the board.
I (4916) gpio: GPIO[13]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
E (4926) sdio.c: Failed to initialize the card (ESP_ERR_INVALID_CRC). Make sure SD card lines have pull-up resistors in place.

More Information.

If we allow the CRC error to happen (also the 'not supported' outcome), it mounts and works perfectly fine. This has been repeated many times to compare failing and working cases.

The change was to use not the SDMMC_INIT_STEP macro but a call to the sdmmc_io_reset() and allow it to fail, especially with CRC error.

espressing commented 3 weeks ago

Would add that this dates back to IDF3.x or even earlier - found the same init sequence and card can "fail" (it doesn't actually fail, it is an expected and definite CRC error in these cases) on the first command it sees (CMD52).

chipweinberger commented 2 weeks ago

add a mosfet. I recommend power cycling your sd card after every boot.

espressing commented 2 weeks ago

Indeed, however this needs product electronics additions, too. A software (library) issue should not have a default solution a hardware change to circumvent below issue. The key issue here is that a typical scenario (CRC error after an aborted transaction) is not handled by the library, because it not only abandons any attempt to mount it after first CMD52 but it is also contravening standard: due to the scenario, it is normal to have CRC error - but CRC check must not be enabled for anything other than 2 commands specified in standard. So because in previous boot-up the ESP32 had CRC enabled in later stages of mounting the card and the card did not get reset, the initial CRC error can be tolerated - especially as it clears the situation on the card and all subsequent steps succeed (and card is usable again). There is no provision in the library to handle this - therefore user loses the card completely as library exits on first CMD52.

chipweinberger commented 2 weeks ago

I recommend opening a PR. It's the best way to have issues resolved.

espressing commented 2 weeks ago

Thank you, admittedly I wasn't sure whether bug route or something else would get best activity for something like this - I'll poke around