MTD is confusing - Githubissues

chrysn commented 2 years ago

Walking between MTD and flashpage, MTD is, for lack of a clearer word, confusing (at least to me); I think that's mostly due to terminology not sufficiently explained, motivated or common.

For comparison, flash is pretty clear in that a page is something that can be erased, and writes happen in alignments and sizes of FLASHPAGE_WRITE_BLOCK_ALIGNMENT / FLASHPAGE_WRITE_BLOCK_SIZE (which I tend to mentally simplify to the maximum thereof -- that is safe, and while they do diverge on some architectures, I don't quite see how I'd use a 2-byte write in a 4-byte alignment in practice, like STM32 seems to allow for some chips).

But MTD introduces sectors and pages, and while sectors are implicitly the erasure units, pages seem to have no inherent semantics at all other than allowing addressing based on them in what vaguely reminds me of the CHS convention of the early ages. This is exacerbated by disconnect from MTD terminology as used on Linux (where I'd assume the term is borrowed from), which deliberately does not talk of pages and sectors but of eraseblocks.

Also odd (I'm leaning toward "missing feature" but with the above I'm not sure) is that while mtd_write says that some devices might enforce alignments, I don't see how the application can get that information. (It's tempting to make a conservative estimate of 8, but with LPC23xx having a FLASHPAGE_WRITE_BLOCK_SIZE of 256, conservative may need very conservative).

I'd like to make a doc PR to enhance things, but right now I don't have a sufficiently consistent mental image to do that. Could you help clarify this?

Pinging @vincent-d and and @AurelienGONCE as the original authors, @jnohlgard as reviewer and @benpicco who added the page addressed operations.

benpicco commented 2 years ago

page_size is the largest block that can be written by one transfer, writing across that boundary causes a wrap-around. sector_size is the erase block size.

The inconsistencies in mtd_write() are mostly historic as before the pagewise functions the user had to take care of splitting the writes if necessary. #15380 should do away with the old functions, but there is still mtd_flashpage which is not so easily converted into uniform pages.

chrysn commented 2 years ago

So the concept of "in one transfer" for the page is purely one of implementation (splitting up write operations), and of performance in some backends, but not of atomicity. The performance can be relevant to implementations, but is it actionable? (I'd assume that if an application needs some bytes written, it will call the MTD layer, but the application won't gain anything from splitting the write across two writes in a run-time decision, especially because RIOT is multithreaded, and a good MTD backend will put the thread to sleep during a write).

benpicco commented 2 years ago

Ideally the application should not have to care about MTD internals to begin with. It's true that page_size only makes sense for some MTD backends and might as well be only present in the implementation specific MTD extension (e.g. mtd_spi_nor_t) , from an API perspective the sector <-> page distinction is rather arbitrary.

chrysn commented 2 years ago

The application will need to care about the erase size to do proper journaling or atomic updates. (That is, unless it is happy with read-erase-write operations -- I personally think that "Do not shut off the device while saving is in progress" is something that died well after the Gamecube generation of electronics). After all, that is what sets it apart from a block device.

Working on some initial text based on the above to improve the docs...

chrysn commented 2 years ago

https://github.com/RIOT-OS/RIOT/pull/17666 now summarizes how I understand things to be.

I think the MTD layer simply lacks some concepts:

A size indicating size-and-alignment of writes. (The driver can emulate unaligned writes, but only if the flash memory does support word rewrites, which not all backends, esp. mtd_flashpage, do.)
A flag indicating whether rewrites are supported.

I'd like to add these in later, then storage like the one required for OSCORE can be built on MTD rather than on flash (where at least information like FLASHPAGE_WRITE_BLOCK_SIZE is available).

There are some more details discussed in the embedde-storage crate, or even present in the flashpage driver, but I think they can be simplified:

Alignment and length of writes may be different (STM has a few combinations) -- but taking the larger of these as write size is not wrong (and I can't imagine it harming any practical application).
Allowing overwrites is not binary but actually a count over some region that may actually be the current page size (but may also be a different concept): "overwrite at will" and "overwrite never" seem to be useful extremes to cover in a flag. (Even "overwrite at will", in practice, means only up to 8 * minimal-write-size times, for then all bits are clear anyway)
Read alignment can be handled by the driver.
IIRC there are some distinctions between flashes that, on overwrite, prefer previously cleared bits to be 0, or prefer previously cleared bits to be 1. If that distinction matters to the backend, the driver can take care of it.

chrysn commented 2 years ago

Looking through some data sheets I found that at least for flash memories it's rather atypical to support arbitrary many rewrites; for example, EFM32GG (of which I thought I remembered they'd support that -- and have used that in code (AFAIK nothing blew yet...). But actually the data sheet says "write up to 2 times".

The embedded-storage classification is way more detailed than I think makes sense to store at run time. (It makes sense there as the user can adjust its data structure at compile time to work with the limitations). But what is an abstraction level that works for us?

Suggestion:

Introduce write granularity.
Writes are only allowed once after erases, and only in the granularity indicated by the former parameter.
Devices with DIRECT_WRITE flag ignore the "only once" limitation. (They could also ignore the granularity, but then there'd be implicit read-modify-write cycles, and the application can't build journals from that).
Some flash memories' "write N times" property is not exposed to the application, they'll have to make do with the pessimistic simplification.
(Note that all flash memories could also be expressed as DIRECT_WRITE by upping their write granularity to full erase units, and that memories that allow writing a cell twice can emulate a device with better granularity that does not support overwrites at all).

Laczen commented 2 years ago

@chrysn, also take a look at some datasheets for nand flashes as these are also mtd. The terminology might not be so strange. As mtd devices should also support nand it does not seem like a good idea to add some nor-flash specifics (rewritability) to the mtd api.

chrysn commented 2 years ago

Can you point me to a good example? I didn't find any in the supported drivers list. (I've skimmed the micron MT29F2G08AABWP data sheet but that was just the first hit on duckduckgo and may not be representative).

Generally I don't think this is becoming NOR centric, it just allows capturing the different characteristics. How would you rather have MTDs characterized?

Sure the MTD API would be more consistent if we'd allow arbitrary overwrites (as appears to be common on NAND memories if I've picked a good example and understood it right) or disallow any overwrites, but that way we'd be either kicking out devices (that don't allow overwrites) or applications (that need overwrites), and by declaring them in flags we can have a single API that works where it is technically practical, without ruling out "underusing" the device (eg. when an application that always erases and writes page-wise is given an SD card as a backend).

As for confusing terminology, it may not be confusing coming from a NAND datasheet, but it should (and I by now think is) be explained in explicit terms that apply independently of the technology.

-- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

RIOT-OS / RIOT

MTD is confusing #17663