atmelcorp / atmel-software-package

Atmel Software Package
Other
108 stars 76 forks source link

Limit in 16MiB for QSPI Flash n25q512ax3 #64

Open ramok opened 6 years ago

ramok commented 6 years ago

I trying to flash on RoadRunner with 64MiB n25q512ax3 QSPI Flash. I get strange behavior when address become more then 16MiB. After trying sam-ba with trace level 4, I get such output:

Applet 'QSPI Flash' from softpack 2.13 (v2.13-31-g65146d0e6256). Initializing QSPI0 IOSet3 at 66000000Hz -I- SF: Got Manufacturer and Device ID: 20ba20 -I- SF: WARNING: can't read above 16MiB Found Device n25q512ax3
Size: 67108864 bytes
Page Size: 256 bytes
Buffer Address: 0x00229e60
Buffer Size: 90368 bytes QSPI applet initialized successfully. `` This fix helps me:

--- a/drivers/nvm/spi-nor/spi-nor-ids.c
+++ b/drivers/nvm/spi-nor/spi-nor-ids.c
@@ -68,7 +68,7 @@
        .sector_size = 65536U,                  \
        .n_sectors = (_n_sectors),              \
        .page_size = 256,                       \
-       .flags = SNOR_HAS_FSR | SNOR_SECT_4K | SNOR_NO_4BAIS
+       .flags = SNOR_HAS_FSR | SNOR_SECT_4K

 #define AT25(_name, _jedec_id, _n_sectors)     \
        .name = _name,                          \

But I think there must be proper solution, which do not brake other flash chips.

And I think default trace level must show this warning WARNING: can't read above 16MiB by default.

CyrillePitchen commented 6 years ago

Hi ramok,

if you look at the n25q512 datasheet from Micron website, you notice that the 12h op code is used for the 4-byte page program on some memory parts whereas this very same op code is used for the extended quad input fast program.

4-byte page program is Page Program 1-1-1 with a 4-byte address. In the 4-byte address instruction set (4BAIS), the associated op code is 12h. On the other hand, extended quad input fast program is Page Program x-4-4 with 3-btyte address.

So some memory parts of the n25q512* family actually do support the 4-byte address instruction set while other memory parts don't due to the 12h op code quirk and the missing op code for 4-byte address Page Program 1-1-1.

Both variants of the n25q512* family may share the same JEDEC ID. Actually there are 2 different JEDEC ID for n25q512:

However, the actual use of the 12h op code is not related to the voltage but linked to other criteria. Hence, there is no easy mean to make the difference between n25q512 memory parts supporting the 4-byte address instruction set from those which do not support it.

In order to provide a common support for both flavors, we have decided not to used the 4-byte address instruction set at all on Micron N25Q* memory parts. So we limit read/write/erase operations to the first 128Mib.

Sorry but we can't accept this patch.

Best regards,

Cyrille

ramok commented 6 years ago

Thank you for detailed answer!

So we limit read/write/erase operations to the first 128Mib.

But in linux kernel everything working fine, may be it's good idea to take a look and do the same in applets?

Sorry but we can't accept this patch.

Sure, patch is bad :) I just want to show what helps me solve problem. What about this one?

--- a/drivers/nvm/spi-nor/spi-nor.c
+++ b/drivers/nvm/spi-nor/spi-nor.c
@@ -347,7 +347,7 @@ init_params:
            if (!info || !(info->flags & SNOR_NO_4BAIS))
                spi_nor_set_4bais(flash);
            else
-               trace_info("SF: WARNING: can't read above 16MiB\r\n");
+               trace_warning("SF: WARNING: can't read above 16MiB\r\n");
        }
    }

With this warning showed on default sam-ba trace level, I will not spend whole day debug strange sam-ba flashing behavior.

CyrillePitchen commented 6 years ago

Hi ramok,

In drivers/mtd/spi-nor/spi-nor.c from the Linux source tree, you will find: { "n25q512a", INFO(0x20bb20, 0, 64 1024, 1024, SECT_4K | USE_FSR | SPI_NOR_QUAD_READ) }, { "n25q512ax3", INFO(0x20ba20, 0, 64 1024, 1024, SECT_4K | USE_FSR | SPI_NOR_QUAD_READ) },

So you can notice that the SPI_NOR_4B_OPCODES is not set because some memory parts don't support the whole 4-byte address instruction set as explained earlier (different usages of the 12h op code).

Using the 4-byte address instruction set is stateless but when not available, or more precisely when the SPI_NOR_4B_OPCODES flag is not set, Linux relies on another mechanism: it makes the SPI NOR flash memory enter its 4-byte address mode. Instead of using dedicated op codes for 4-byte address operations, enabling the 4-byte address mode changes the internal state of the SPI NOR flash memory so the memory now expects the very same op codes previously used for 3-byte address operations but now followed by a 4-byte address.

About your patch proposal, I agree with you about replacing trace_info() by trace_warning().

Best regards,

Cyrille This means that as opposed to using the 4-byte address instruction set, which is stateless, enabling the 4-byte address mode is stateful. This comes with a huge drawback as most bootloaders, such as u-boot, don't know how to deal with this mode as they expect the SPI NOR flash memory to be in its power on reset state, hence in 3-byte address mode. So if enabled at boot time, the 4-byte address mode prevents from booting on many many board designs.

Actually, most board designs fall into the trap of not resetting properly the SPI NOR flash memory when the CPU resets. This is the reason why the SPI_NOR_4B_OPCODES flag was introduced in Linux: http://lists.infradead.org/pipermail/linux-mtd/2016-November/070417.html

Entering the 4-byte address mode then performing the operation and finally exiting the 4-byte address mode could be a work-around but IMHO, for now its now the priority for SAM-BA 'qspiflash' applet since it would introduce a real mess in the source code only to handle one memory manufacturer quirk.

About you patch proposal, I agree with you, it would be better to replace trace_info() by trace_warning().

Best regards,

Cyrille

rphedman commented 6 years ago

I recently ran into this too! It was a good thing I had tracing turned on by default otherwise I would never have noticed and likely wasted days.

As long as access past 16MiB is not supported by the norflash applet I strongly suggest that the applet raises an error when asked to write or read above 16MiB. Currently it appears to succeed but in fact the operation occurs at an address modulo 16MiB.

For example if you try writing to location 0x1800000 it silently writes to 0x800000 instead. Very nasty!

CyrillePitchen commented 6 years ago

We could create a new flag, mainly used inside the N25Q() macro, so we force the use of Page Program 1-1-4 instead of Page Program 1-1-1, when the SPI controller supports the SPI 1-1-4 protocol, like the QSPI controllers do.

Indeed, both 3-byte address Page Program 1-1-4 (32h) and 4-byte address Page Program 1-1-4 (34h) op codes are already supported by spi_nor_convert_3to4_erase().

Hence when the SPI controller supports the SPI 1-1-4 protocol and this new flag is set, we just have to overwrite flash->write_proto to SFLASH_PROTO_1_1_4 and flash->write_inst to SFLASH_INST_PAGE_PROGRAM_1_1_4.

This should fix the issue with the QSPI controller and is less intrusive than entering the 4-byte address mode, performing the operation then exiting the 4-byte address mode for each single operation.

seqlabs commented 4 years ago

This is nonsense.

Do not confuse 4-byte address with 4-bit (quad mode).

Uboot, spi driver, which I believe ATMEL/MCHP also maintains, works properly when the definition for the larger Micron devices is corrected. The uboot driver works correctly in both quad and dumb-spi (1-wire IO) mode for devices > 16MB.

It is not enough to rely on datasheets because the quality of those is generally terrible and they are often incorrect. Try testing with real hardware. Only then does one experience the full bustedness of the software and discover that the datasheets are vague or incorrect.

Further, the sam-ba applet neglects lock control for Micron. This too is in uboot and appears to work correctly.

====

We are going to raise with JC+PV. It is crazy that this bug is unresolved after almost 18 months.

CyrillePitchen commented 4 years ago

I've just had a look at the fix in the tag v2020.04-rc2 of u-boot: 8651593a8ce0 ("spi-nor: spi-nor-ids: Add entries for mt25q variants") from Vignesh in October 2019.

I also read datasheet for MT25QL512A for instance. It shows that the latest memory parts from Micron now use an extended JEDEC ID that encodes device generations. Especially, bit 6 in the 1-byte Extended divice ID ("5th byte of the READ ID response") is set to 1 for the 2nd device generation.

Therefore:

1 - On the 1st generation, there are 2 versions: 1a: the 0x12 op code is used for 4-byte address Page Program 1-1-1 (single I/O) 1b: the 0x12 op code is used for 3-byte address Page Program 1-1-4 (Quad I/O)

The patch I did for Linux, later adapted in u-boot, uses the 4-byte address instruction set, hence the 0x12 op code for Page Program operations. Setting the SPI_NOR_4B_OPCODES for a given memory part >= 128Mbit (16Mbyte) tells the spi-nor sub-system that the 4-byte address instruction set is supported by the memory part and should be used instead of the classic 3-byte address instruction set. This works just fine with version 1a but would break Page Program operations, even for the memory range [0 - 16MByte[, with version 1b therefore creating a regression for those memory parts. This is not theoretical, this has been tested and verified with some Micron memory parts.

Since both 1a and 1b use the very same JEDEC ID, AFAIK there is no mean for the software to make the difference between the 2 versions only by probing the memory to dynamically guess its exact model. So, fixing the issue for 1a as suggested here would break the support of 1b.

2 - On the 2nd generation, according to the Micron datasheet, it looks like the 0x12 op code is now always used for 4-byte address Page Program 1-1-1 (Extended SPI), as expected, so the SPI_NOR_4B_OPCODES flag be set safely to allow access above 128Mbit. This is the assumption made by Vignesh in his patch for u-boot.

This patch seems suitable but it fixes only the 2nd generation, not the 1st one: the 1st generation is left unchanged. On the other hand, the patch proposed here tried to fix the 1st generation but it would have introduced a regression with 1b.