espressif / idf-extra-components

Additional components for ESP-IDF, maintained by Espressif
152 stars 93 forks source link

MT29 filesystem mount problems (IEC-169) #385

Open mttcarbone opened 1 month ago

mttcarbone commented 1 month ago

Answers checklist.

General issue report

I am using this library on a project with MT29F2G01ABAGD, I am experiencing problems starting esp32-s3: returns this warning _vfs_fat_nand: fmount failed (13) the flash is formatted, and proceeds to write and read the hello.txt file

What can I do to avoid formatting the flash every boot?

Doing other tests, if I unmounted and mount the flash the hello.txt file is read incorrectly, and it truncates "HELLO TXT" as its contents, what can I do to avoid this?

Screenshot_20240926_180456

igrr commented 1 month ago

Could you please try to verify that basic read and write operations to flash work correctly? FatFS error 13 usually indicates that the filesystem is corrupted. The most common case is that the underlying Flash driver isn't transferring some data correctly in one of the directions. You can try calling spi_nand_flash_write_sector and spi_nand_flash_read_sector to write some test data and then verify that the read back result is the same.

By the way, since this flash chip is not on the list of supported ones, it might make sense to check its datasheet and see if there is any difference related to timing, compared to other supported chips.

Teesmo commented 1 month ago

@igrr What Micron Chips do you support? (There are none listed here), but provision has been made for Micron chips in this header file. There's also an implentation here that caters for multiple vendors, including Micron.

igrr commented 1 month ago

AFAIK we haven't tested this component on any of the Micron chips. Micron support was contributed by @UnTraDe in https://github.com/espressif/idf-extra-components/pull/327. I guess @UnTraDe used it successfully, maybe there is something slightly different in your hardware setup. As I mentioned above, I would recommend doing a basic write/readback sanity check to make sure that the data is being written correctly.

By the way, do you folks work on the same project or it just so happened that you both need Micron NAND flash support? If it's the latter, we can try to get some samples of that chip and test them...

mttcarbone commented 1 month ago

@igrr Thank you for the support! We are different team, I chose micron because of cost issue, comparing with the nand flash on the market it has good prices. It would be great to have your support, extend support to micron like the MT29F2G01ABAGDWB in my possession and share it with the community and help other teams!

igrr commented 1 month ago

Okay, we'll order some boards with MT29 chips, might take some time (worst case, 2 weeks) to get them.

In the meantime perhaps you could do the test I have suggested above and share the result you get.

mttcarbone commented 1 month ago

Great! Thank you! I'm following your directions, hope to share some valuable considerations and update on my evidence as early as the next few days.

UnTraDe commented 1 month ago

Hi, we are currently using this driver with MT29F4G01ABAFDWB successfully, it should work on the rest of the MT29F series. As far as I understand the differences between the chips in this series is the size and organization of the memory itself (page size, block size, etc), so the only thing needed to add support for them is the ID of the model and the different sizes. Take a look at the changes here: https://github.com/espressif/idf-extra-components/pull/327/files

mttcarbone commented 1 month ago

I report here the part of code that I added to the library:

I have several doubts:

static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    dev->read_page_delay_us = 115;
    dev->erase_block_delay_us = 2000;
    dev->program_page_delay_us = 240;
    switch (device_id) {
    case MICRON_DI_34:
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
        dev->dhara_nand.num_blocks = 1024;
        dev->dhara_nand.log2_ppb = 6;        
        dev->dhara_nand.log2_page_size = 11; 
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

Looking at the flash structure, is there anything wrong in the configuration? @UnTraDe Thanks for your input and for helping me!

Screenshot_20240927_211225 Screenshot_20240927_211220

mttcarbone commented 1 month ago

Hi @igrr , I did the tests you mentioned, I used the spi_nand_flash_read_sector function to see if it wrote correctly to the sectors. The thing does not return any errors.

By increasing .allocation_unit_size = 32 * 1024 from 16 to 32 when I go to re-read the file it comes back correct. There remains the problem that at each reboot of the esp32-s3 the memory is formatted because the file system mount goes errorre.

Analyzing SPI, I saw that QSPI WP and HD pins are not working, is this normal? My spi configuration is this:

#define HOST_ID SPI3_HOST
#define PIN_MOSI (16) 
#define PIN_MISO (14) 
#define PIN_CLK (15) 
#define PIN_CS (13) 
#define PIN_WP (40) 
#define PIN_HD (41) 
#define SPI_DMA_CHAN SPI_DMA_CH_AUTO 

I chose SPI3_HOST, because in the future SPI2_HOST will be used for a display

Screenshot_20241001_160842

igrr commented 1 month ago

I did the tests you mentioned, I used the spi_nand_flash_read_sector function to see if it wrote correctly to the sectors. The thing does not return any errors.

Could you please clarify how you tested this? spi_nand_flash_read_sector doesn't write sectors, you need to call spi_nand_flash_write_sector in order to do that. To check if the data has been written correctly, compare the data you read back to the one you originally wrote.

You can check this code from the test app: https://github.com/espressif/idf-extra-components/blob/0603d10e0ee06cdd63ad78d3960238891c49db70/spi_nand_flash/test_app/main/test_spi_nand_flash.c#L129-L153

Analyzing SPI, I saw that QSPI WP and HD pins are not working, is this normal?

Yes, that's normal, currently the library doesn't make use of DIO or QIO related commands, there is a discussion about that in another issue: https://github.com/espressif/idf-extra-components/issues/375#issuecomment-2379644911.

mttcarbone commented 1 month ago

Hi @igrr , I ran the following code in the main:

    uint32_t sector_num, sector_size;
    spi_nand_flash_device_t *nand_flash_device_handle;
    spi_device_handle_t spi;
    setup_nand_flash(&nand_flash_device_handle, &spi);

    TEST_ESP_OK(spi_nand_flash_get_capacity(nand_flash_device_handle, &sector_num));
    TEST_ESP_OK(spi_nand_flash_get_sector_size(nand_flash_device_handle, &sector_size));
    printf("Number of sectors: %" PRIu32 ", Sector size: %" PRIu32 "\n", sector_num, sector_size);

    do_single_write_test(nand_flash_device_handle, 1, 16);
    do_single_write_test(nand_flash_device_handle, 16, 32);
    do_single_write_test(nand_flash_device_handle, 32, 64);
    do_single_write_test(nand_flash_device_handle, 64, 128);
    do_single_write_test(nand_flash_device_handle, sector_num / 2, 32);
    do_single_write_test(nand_flash_device_handle, sector_num / 2, 256);
    do_single_write_test(nand_flash_device_handle, sector_num - 20, 16);

    deinit_nand_flash(nand_flash_device_handle, spi);

With the following flash configuration:

static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    switch (device_id) {
    case MICRON_DI_34:
        dev->read_page_delay_us = 115;
        dev->erase_block_delay_us = 2000;
        dev->program_page_delay_us = 240;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
        dev->read_page_delay_us = 55;
        dev->erase_block_delay_us = 2000;
        dev->program_page_delay_us = 220;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        
        dev->dhara_nand.log2_page_size = 11; 
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

I get this error back: Screenshot_20241002_175209

Reading the datasheet, I saw this part here: Screenshot_20241002_160314 Screenshot_20241002_161541

As you can see the 12bit is passed the plane of the flash, unlike the 4Gb model, where it is not needed.
Looking at the library I can't figure out where to pass the plane bit, could you direct me?

igrr commented 1 month ago

I don't think any special handling is needed for the plane index. The driver is sending a 16-bit block address to the flash chip:

The only thing which looks odd is that according to figure 7, the plane index should be the LSB of the address, not MSB. However this only changes the mapping of addresses to the physical blocks in Flash, aside from some performance difference it should still work either way.

The exception you got seems unrelated to this problem, but I can't tell what specifically is wrong since the exception output is cut off in your screenshot. Please check https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html for instructions on how to interpret exception output. If you post the logs, please do post them in text format rather than as images. It might also help to decrease the log level from "verbose" to "info" or "debug" since there seems to be a lot of unrelated messages in the log.

mttcarbone commented 1 month ago

Sorry @igrr , I'll fix this right away, I'll send you a file with monitor output, with debug functions turned on. I hope it can give you enough information.

debug_output.txt

igrr commented 1 month ago

Hi @mttcarbone, thank you for the logs you provided. I was able to reproduce the issue with MT29F2G chip.

As you have suspected, the dual plane memory organization does require additional handling. For "read cache" and "program load" commands, plane index must be added as the MSB of the column address. For example, if p is the page number, column address ca should be modified to ca + (((p /64) % 2) << 12), where 64 is the number of pages per block, 2 is the number of planes, and 12 is the address bit which sets the plane index.

After modifying dhara_nand_is_bad, dhara_nand_prog, dhara_nand_is_free, and dhara_nand_read this way, the tests are passing.

I will check if other SPI NAND flash chips implement similar way of handling interleaved addressing, and will try to generalize my patch into something that won't be specific just for MT29F2G.

mttcarbone commented 1 month ago

Hi @igrr , Yes, in the last few days I had tried to modify in dhara_nand_prog this code:

ESP_GOTO_ON_ERROR(spi_nand_program_load(dev->config.device_handle, (uint8_t *)&used_marker,
                                            dev->page_size | ((p/64)%2 << 12) ) + 2, 2),
                                            //dev->page_size + 2, 2),
                     fail, TAG, "");

and also in dhara_nand_is_free this code:

ESP_GOTO_ON_ERROR(spi_nand_read(dev->config.device_handle, (uint8_t *)&used_marker,
                                (dev->page_size | ((p/64)%2 << 12) ) + 2, 2),
                                //dev->page_size + 2, 2),
                  fail, TAG, "");

but it kept giving me error, I will do some tests by modifying dhara_nand_is_bad function as well.

Would you share with me your tests on the functions in the file dhara_glue.h so I can help you test?

igrr commented 1 month ago

Yeah, you need to modify all 4 functions (also dhara_nand_is_bad and dhara_nand_read). You can check my draft changes over here: https://github.com/espressif/idf-extra-components/pull/397/.

mttcarbone commented 1 month ago

Hi @igrr , I tried your code on my flash, and it works! Great job.

I'll share you test output, so you can compare it with yours. debug_output.txt

In my tests, I made a small modification which I will report here:

static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    dev->erase_block_delay_us = 2000;
    switch (device_id) {
    case MICRON_DI_34:
        dev->read_page_delay_us = 115;
        dev->program_page_delay_us = 240;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
       dev->read_page_delay_us = 55;
        dev->program_page_delay_us = 220;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 11; // 2048 bytes per page
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

Is related to the dev->read_page_delay_us time, in the datasheet of the 2G version they report this specification:

Screenshot_20241008_081727

I did some tests with both your time configuration and mine, it seems that the result is identical. But seeing the good work done by @UnTraDe in sticking to the 4G datasheet I wanted to replicate.

mttcarbone commented 1 month ago

@igrr Adding, Doing various tests with the code inside the spi_nand_flash/examples/nand_flash example that when I configured .format_if_mount_failed = false on the first boot it works, and it continues from the 3-4 reboot it starts to tell me error:

W (4616) vfs_fat_nand: f_mount failed (13)
E (4616) example: Failed to mount filesystem. If you want the flash memory to be formatted, set the CONFIG_EXAMPLE_FORMAT_IF_MOUNT_FAILED menuconfig option.
I (4636) main_task: Returned from app_main()

I made an output file of what the monitor shows me. output_monitor-2.txt

Mind you, this problem occurred by creating a new project without the test functions and the unity library for debugging. This is why this seems like a very unusual problem.

igrr commented 4 weeks ago

@mttcarbone After looking at the code again, I have realized there are still at least two issues remaining:

I have pushed the latest version to the same PR. (Totally unverified! I didn't have this flash chip at hand today.)

Seeing that some of MXIC flash chips also use this dual-plane architecture, we will probably have to support this, however I might suggest picking a different flash chip as a possibly simpler solution, if your project allows for this.

mttcarbone commented 4 weeks ago

Hi @igrr , Thank you for your feedback, for the cost and space offered the 2G version is very advantageous. If it is okay with you I will continue development on the 2G, I certainly won't be as quick as you to find a solution. But I would like to take it forward, let me know when you get the chip, but I hope to get to a defined code as soon as possible.

I will keep you updated!