Separate flash write API into raw and quirk-handling versions

devyte commented 3 years ago

Basic Infos

[X] This issue complies with the issue POLICY doc.
[X] I have read the documentation at readthedocs and the issue is not addressed there.
[X] I have tested that the issue is present in current master branch (aka latest git).
[X] I have searched the issue tracker for a similar issue.
[X] If there is a stack dump, I have decoded it.
[X] I have filled out all fields below.

Platform

Hardware: [ESP-12|ESP-01|ESP-07|ESP8285 device|other]
Core Version: [latest git hash or date]
Development Env: [Arduino IDE|Platformio|Make|other]
Operating System: [Windows|Ubuntu|MacOS]

Settings in IDE

Module: [Generic ESP8266 Module|Wemos D1 mini r2|Nodemcu|other]
Flash Mode: [qio|dio|other]
Flash Size: [4MB/1MB]
lwip Variant: [v1.4|v2 Lower Memory|Higher Bandwidth]
Reset Method: [ck|nodemcu]
Flash Frequency: [40Mhz]
CPU Frequency: [80Mhz|160MHz]
Upload Using: [OTA|SERIAL]
Upload Speed: [115200|other] (serial upload only)

Problem Description

CC @drzony @earlephilhower @TD-er

To better understand https://github.com/esp8266/Arduino/pull/7514, I had some (lengthy) discussions with @drzony. Per his explanation (please bear with me here, I'm not an expert here and my understanding may be flawed):

99% of flash brands allow writing multiple times to the same address, as long as bits always flip 1->0 and no bits flip 0->1
This is an unsupported feature exploit. Datasheets explicitly state that writing must be done only to an area that has been previously erased.
Erasing is done on an entire block (4KB in our case). Writes can be done in bulks of 1 to pageSize bytes (256 bytes in our case). Multiple writes can be done within a previously erase block, as long as subsequent writes are always done in offsets that haven't been previously written to.
SPIFFS seems to exploit point 1 above as a design choice. We extended this exploit to our core.
In the case of PUYA chips the exploit doesn't work, so we had to implement the infamous PUYA patch to allow SPIFFS to work.
For normal flash operations within our core, i.e. EEPROM, Updater, eboot, etc, the PUYA patch isn't necessary. It's only SPIFFS that needs it.
For flash operations done elsewhere, i.e. 3rd party libs, the PUYA patch may be needed.

It seems that SPIFFS assumes NOR type flash, while other code like LittleFS assumes NAND type (@earlephilhower), and the PUYA chips don't like the SPIFFS assumption.

The proposal here is:

implement a formal raw flash read/write api. Our core would use this api for all of its internals, except SPIFFS. Behavior could be brand-specific, i.e.: quirks wouldn't be handled, whether in-spec (XMC) or exploit (99% of chip brands).
implement a formal quirk-handling flash read/write api on top of the previous. Special cases such as XMC and PUYA would be handled here. From the user's pov, all flash brands would behave the same, although there could be some cases with performance impacts (PUYA as it's being used now)
Up for discussion: SPIFFS apparently has its own handling for the PUYA case. It could be enabled and its behavior become isolated, or the underlying flash hal could be replaced with the quirk-handling api above for a generalized solution.

The above would mean that the current performance penalty for the PUYA case could be reduced or even eliminated for the non-SPIFFS operations.

TD-er commented 3 years ago

Isn't it simply a matter of reserving 1 byte, initialize it with "0x55" (or some other pattern) and try to set one of the 0's to 1. If the byte you read is anything other than the initial pattern, you know you're dealing with "Puya-like" chips. If you still see the initial pattern after 1 write attempt, then it is "normal" flash and just set one of the 1's to 0 to mark you completed the test.

This way you know what type of flash it is every time at boot without the need to check it using a write. I think it may be a bit tricky to assume some expected behavior from a library, so use the existing patch strategy by default, unless you know for sure the library can work with "Puya" like flash and then explicitly mark it so.

devyte commented 3 years ago

That's an auto detection approach, which I believe has been discussed elsewhere. It may or may not be viable, similar to the flash size detection approach mentioned somewhere, but the goal here is a different one. Let's not hijack the thread.

drzony commented 3 years ago

@devyte I don't think that SPIFFS has a workaround for PUYA, according to https://github.com/pellepl/spiffs/issues/172 it only fixed relying on 0->1 changes to be ignored by flash, but it still does 1->0 changes on the same offset.

drzony commented 3 years ago

As for the change in API, I don't think we need "no-quirk" API. It would mean that everyone who wants to use it would need to copy-paste the code of "with-quirks" API to his own code.

My proposal with regards to PUYA/non-PUYA mess is to create Esp::flashWriteNoErase function that can be used by people that want to use the exploit mentioned in point 1. We could then remove PUYA workaround from normal flashWrite. We would then have always-works-datasheet-correct flashWrite and maybe-works-non-datasheet-corect flashWriteNoErase.

drzony commented 3 years ago

@TD-er I think that the first line of your comment is incorrect I think you are confusing https://github.com/pellepl/spiffs/issues/172 with PUYA. Write without erase cases:

Changing 0->1 and/or 1->0 on PUYA with the same offset, will result in garbage
Changing 0->1 on some TI flashes will cause an error (they have an interrupt and a special register for that INVDRIS - Invalid Data Raw Interrupt Status), on other flashes it's ignored (write succeeds, but 0 remains 0 on flash)
Changing 1->0 on most of the flashes will result in proper write (including TI flashes)

@devyte Currently there is no workaround for 2 in any of ESP core code, since nobody encountered such a flash on ESP

TD-er commented 3 years ago

@drzony Your second point about checking a special register sounds interesting. Could that be used to 'test' if other libraries rely on specific flash behavior? That would probably simplify testing libraries to see if they would cause issues if the flash write behavior would be changed. The 'normal' way for testing would be to include such testing in some debug mode of the write call, similar to what is done now for the Puya patch.

@devyte How do you imagine to switch between flash write behavior per library? For example add a define in the .cpp of a library to force using a different flash write strategy? (that's what I was thinking of when I mentioned to only allow libraries to use the "optimal" write strategy for libraries known to work well with it)

drzony commented 3 years ago

@TD-er It cannot be used that way, since it only prevents 0->1 changes, which is ignored on other flashes (0 stays 0, but write succeeds)

TD-er commented 3 years ago

I meant if it marks such an operation as error in a special register, then such a chip could be used to help testing existing libraries without the penalty of needing extra reads.

drzony commented 3 years ago

Not really, you need to know beforehand that such chip has this register, otherwise you can't tell whether this is exactly this error (and if you know the chip, then you're done with detection). Anyway, let's not hijack API change thread with autodetection which is a whole large topic on it's own.

TD-er commented 3 years ago

It was by no means intended to switch back to "autodetect" but more to address my worries on decisions to make what a "default" flash write strategy should be. Either way it is about knowingly render existing libraries incompatible (but not knowing which are incompatible as it is impossible to test it all) or keep the existing strategy as default, and only allow libraries known to work with the new strategy to use the new calls.

Or am I missing the point here?

drzony commented 3 years ago

Since we are allowing API breaking changes in v3.0 we can rename flashWrite to flashWriteNormal or something like that and add flashWriteNoErase, then the libraries that use flashWrite would need to make counscious decision about what strategy to use. (As an alternative we could add strategy parameter to flashWrite)

earlephilhower commented 3 years ago

My $0.02, for what it's worth: Flash != EEPROM, and I'm not really sure it's worth the effort in the end.

For EEPROMs there is expectation of byte-level (and even bit-level) update and everything should "just work." The current EEPROM.cpp library does this pretty well for apps, as far as I can see. It emulates using plain RAM and then flushes on final requests.

NAND Flash, however, has significant limitations in all incarnations I've run into. You have things like erase blocks, partial page programming, and a host of other ugliness. So, apps writing to flash need to take heed of these. Most basic would be flash word alignment and sizing. Current code requires 32b aligned, 32b sizes writes (due to a combination of the HW engine that does the writes and the blob/ROM which have minimal protections). That seems like a fine requirement because the core itself only writes using the blob (already 4K aligned, 4K sized AFAIK), the EEPROM (aligned properly AFAIK), LittleFS (aligned properly AFAIK), and SPIFFS (AFAIK may be aligned properly but (ab)uses some mode which is fine for NOR flash but isn't a property of the NAND flash people are shipping on the 8266 boards today).

SPIFFS is deprecated and not supported upstream, so I really would not sweat that too much.

If a user decides to use the raw flash outside of the API the Arduino core provides, they're welcome to but need to follow the restrictions. It might be worthwhile to throw in some assert()s on the flash write params so users would get warning they were doing something bad at the exact line they do it, as opposed to later on when some bit of flash has been changed in an unpredictable way.

esp8266 / Arduino