FlashStorage & EEPROM libraries

algernon commented 2 years ago

This is a work-in-progress version of my attempt at a pair of FlashStorage & EEPROM libraries.

FlashStorage is a lower-level library that abstracts away (most of) the burden of storing and retrieving user data from flash. It simply provides a read() and a write() method, that handle all the pageing, erasing and updating and whatnot behind the scenes, so the end user doesn't have to care about the low level details.

EEPROM builds on top of that, and provides an Arduino-esque EEPROMClass, with an API similar to the AVR EEPROM. It's more limited than that, because only the basic functionality required for Kaleidoscope is implemented at this point (no C+11 iterators or array access at this point).

There are a few things I need to work out still, namely:

[x] Sane defaults. At the moment, I copied some defaults from 05_USBD_MSC_internal_flash. Ideally, we'd have those defaults come from boards.txt. The major things we need is the size of the storage, where it starts (or where it ends, which is more practical to store in boards.txt in our case), and the page size.
[x] Make small operations more efficient. At the moment, all reads and writes require a full page read/write, even if we could do with less. In case of Kaleidoscope, we quite often do much smaller reads and writes.

For reads, we can be much more efficient, because we don't need any paging there. We just read N byes of data from an address, and done.

For writes, the most efficient way would be to copy the whole storage to RAM, and only write back on commit(). That'd make writes a lot more efficient, at the cost of ram. Another option would be to have a prepareWrite() function, that tells the library how much data we'll write, and where, so it'll prep that area in RAM, and write it all back in commit(). This, however, would require dynamic memory allocation. There's fmc_word_reprogram(), which should - judging by its comments - reprogram a word at an address without erase. We could use that for more efficient small writes, as we won't have to do full page erase + writes. However, any of these require more thought, and I'll postpone efficient writes 'till a follow-up PR instead. Lets get the very basic functionality working in this one first.

Meanwhile, I'm submitting this as a draft PR for comments, and perhaps ideas. The code has not been tested on hardware yet, I've only done compile tests with this iteration. Once I got to the "test on hardware" stage, I'll report back too.

algernon commented 2 years ago

According to the GD32F30x user manual: "The flash page size is 2KB for bank0, 4KB for bank1", with bank0 being the first 512kb. If we use both, we'd need to keep track of two page sizes, an additional complexity. For this reason, I'd restrict the libraries for bank0 only, and hard-code 2k as the page size. If and when we need more than 512kb for this kind of storage, we'll figure something out then.

algernon commented 2 years ago

Okay... maybe 2k is not the right choice, and it does need to be set in boards.txt. If we use a chip that supports more than 512kb of flash, we ideally want this storage at the end, meaning we'll be in the 4kb-page bank. If we use a chip with <= 512kb, we'll be in bank0 with 2kb page sizes.

maxgerhardt commented 2 years ago

Just some general thoughts on this:

From the GD32F303xx datasheet:

grafik

So for the GD32F303xx series, the higher the total flash amount, the usable size for code stays the same, just additional space for data storage is added. If the library could only handle Bank0 (up to 512kB total flash), everything starting at CG couldn't be taken proper advantage of. The library should be able to quickly discern whether an address is in a 2kB or 4Kb page because it can subtract the address from the 0x800000 start (to get the offset from flash start), if the offset is less or equal to 512*1024 it's in the Bank0 region and thus 2Kb, else 4Kb.

We must also note that the linker scripts need to be adapted for this, so for the e.g. CC device we can reserve some (user-selectable at best) last sectors for data flash, that must then be subtracted from the available flash for code in the linker script. If a dynamic user selection happens via menus, that must also be mirrored to the PlatformIO builder scripts.

algernon commented 2 years ago

For a brief moment, I pondered about not caring about pages, and just using fmc_word_reprogram(), 'cos we don't need to erase pages then. But we'd still need to page-align writes, so that's out.

So for the GD32F303xx series, the higher the total flash amount, the usable size for code stays the same, just additional space for data storage is added.

AHA! Thanks, this makes a few things easier.

The library should be able to quickly discern whether an address is in a 2kB or 4Kb page because it can subtract the address from the 0x800000 start (to get the offset from flash start), if the offset is less or equal to 512*1024 it's in the Bank0 region and thus 2Kb, else 4Kb.

Mhm. I think I found a reasonably nice way to do just that. Thanks!

We must also note that the linker scripts need to be adapted for this, so for the e.g. CC device we can reserve some (user-selectable at best) last sectors for data flash, that must then be subtracted from the available flash for code in the linker script.

For now, I don't think we need dynamic user-selection of storage size, just hard-coding a value for each device in boards.txt should suffice, and we push that down to the linker. Makes things simpler in the beginning, and we can always iterate from that at a future date.

algernon commented 2 years ago

Pushed another variant. This assumes that all data storage will be used for FlashStorage (and consequently EEPROM), and relies on the system parts to define FMC_BANK0_SIZE to figure out where the bank0/bank1 split is. It doesn't do any sanity checking whether there's even a bank1, or if we're out of bounds (either in bank0 or bank1).

This is probably wrong, I'll be doing some more reading & refactoring.

obra commented 2 years ago

As a note, @howitzer74 confirmed that the datasheet is misleading about 'program' and 'data' storage. The extended storage area is a hair slower than the main storage area, but there is no actual restriction on what can go where. You can fill flash with a program.

algernon commented 2 years ago

A'ight. I'll take a different approach then: the storage area will end at flash end, and we'll use a custom size for it (via a template argument). Will still support both banks, but the library will prefer bank1, 'cos it's nearer the end.

There will be no default, global EEPROM object instantiated, users of the library will have to set up the size, and adjust their build scripts and whatnot appropriately, if needed. This allows the greatest flexibility for the library.

obra commented 2 years ago

On Wed, Aug 25, 2021, at 12:00 PM, Gergely Nagy wrote:

A'ight. I'll take a different approach then: the storage area will end at flash end, and we'll use a custom size for it (via a template argument). Will still support both banks, but the library will prefer bank1, 'cos it's nearer the end.

I suspect we should default to a fixed size that's comparable to other arduino implementations, rather than sized to the entire bank....what do other implementations do?

There will be no default, global EEPROM object instantiated, users of the library will have to set up the size, and adjust their build scripts and whatnot appropriately, if needed. This allows the greatest flexibility for the library.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CommunityGD32Cores/ArduinoCore-GD32/pull/35#issuecomment-905794000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2FWJORWZVT3IGKEKZDT6U4UNANCNFSM5CZHRFKQ.

algernon commented 2 years ago

I suspect we should default to a fixed size that's comparable to other arduino implementations, rather than sized to the entire bank....

Oh, I didn't mean to use the entire bank going forward. I only used the entire data area because the docs suggested it can't be used for program code, so made sense to use it all for storage. (At least as a first iteration)

Now that I know program code can go there too, not going to use an entire bank, no.

what do other implementations do?

FlashStorage uses an 1k default for the EEPROM object, and requires the library user to declare the flash area when using the library without the EEPROM emulation. To override that 1k, one needs to either modify the source, or compile with -DEEPROM_EMULATION_SIZE=x. I'd rather have the size modifiable without having to pass compiler flags.

To do that, I see two approaches: one is to not have an EEPROM object pre-defined in the library, but require library users to initialize one, with the desired size set at compile time via template arguments. In this case, there simply is no default. The other approach is to have a default (I'd suppose something bigger than 1k, though, perhaps 10k?), overrideable with -DEEPROM_EMULATION_SIZE=x, but in addition to that, the EEPROM object does not compile if -DWITHOUT_DEFAULT_EEPROM is set, so devices that wish to allow setting the size in other ways (but still at compile time), can do so. This'd make it easier to use Kaleidoscope's DeviceProps to set the size, rather than a less discoverable boards.txt snippet.

algernon commented 2 years ago

On second thought: I can just go with EEPROM being always present, and defaulting to 1k/10k/whatever, and I can build Kaleidoscope's driver around FlashStorage, rather than EEPROM (the latter being a very thin wrapper anyway).

algernon commented 2 years ago

Another update, this is much closer to what I think I'd be happy with, and is considerably simpler and more concise than the previous versions.

It positions the data at the end of flash, growing back from there, to a custom size. The FlashStorage class is template-able, and requires the custom size as template argument. EEPROM is not templated, and used EEPROM_EMULATION_SIZE for the storage size, which defaults to 10k. The storage space may cross the bank0/bank1 boundary, the library doesn't care, will figure out the correct page sizes appropriately.

The size of flash is determined by the upload.maximum_size parameter in boards.txt, it is passed to the build process via platform.txt's build.info.flags. I'm not familiar with platformio, so help in getting the platformio build script updated would be most appreciated.

This way, we have sensible - but overrideable - defaults for EEPROM, and it works in an Arduino-esque way. We also have FlashStorage for cases where EEPROM is not the most practical way to implement the storage.

There are still no sanity checking involved, the library will happily try to read and write from strange places, it is up to the library user to make sure addresses are valid.

obra commented 2 years ago

There are still no sanity checking involved, the library will happily try to read and write from strange places, it is up to the library user to make sure addresses are valid.

I'm sure you intend to address this before we merge, but especially given that this could mean the bootloader could get clobbered, we should probably make sure the EEPROM library has good protections on writes.

algernon commented 2 years ago

I can - and intend to, indeed - make it handle out of bounds operations, both read and write. It has all the data it needs to do that, and is easy to check. Question is: what shall we do in out of bounds cases? Should read and write return an error value? Should they just turn no-op?

For EEPROM, we can't always return an error value, not with EEPROM.get() or EEPROM.put(), which return templated stuff. In the EEPROM case, I think the best we can do is no-op on out-of-bounds operations.

obra commented 2 years ago

On Wed, Aug 25, 2021, at 3:28 PM, Gergely Nagy wrote:

I can - and intend to, indeed - make it handle out of bounds operations, both read and write. It has all the data it needs to do that, and is easy to check. Question is: what shall we do in out of bounds cases? Should read and write return an error value? Should they just turn no-op?

What do other cores' EEPROM implementations do? For EEPROM, we can't always return an error value, not with EEPROM.get() or EEPROM.put(), which return templated stuff. In the EEPROM case, I think the best we can do is no-op on out-of-bounds operations.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CommunityGD32Cores/ArduinoCore-GD32/pull/35#issuecomment-905918656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2BIGEP747XVFAUPGZDT6VVBXANCNFSM5CZHRFKQ.

algernon commented 2 years ago

What do other cores' EEPROM implementations do?

SAMD51's FlashStorage doesn't do bounds checking. It uses a RAM copy of the EEPROM, and will happily read and write out of bounds. However, that only corrupts data in memory, and will not overwrite anything in flash, because it only writes there on commit(), and that's always within bounds.

For AVR... the library itself does not appear to do any bounds checking, and relies on avr-libc. I've looked at avr-libc, and... these particular functions, eeprom_read_byte() and eeprom_write_byte() are in assembly, and I'm not familiar with that. But at a first glance, it doesn't seem to do any bounds checking, either.

In any case, if we follow the existing EEPROM API from Arduino, we can't return errors. We can only no-op, which is what my latest push did.

algernon commented 2 years ago

There's one major issue I want to address: write speed of small data. Right now, we write full pages only, even if we change only a couple of bytes. This isn't a big deal if we do this rarely. However, in Kaleidoscope, most of our writes are between 1-4 bytes a time. Even for larger amounts of data, we write 1-4 bytes at a time in a tight loop.

For example, for a 3-layer keymap, we write 3*64*2=384 bytes, continuously, one byte at a time. With the current FlashStorage implementation, that's likely going to result in reading a 4kb page 384 times, changing a single byte, and writing the full 4kb page back. Even on a fast chip, that's going to be very inefficient.

I can imagine a couple of ways to improve it, all with their own drawbacks:

We could augment write() to use fmc_word_reprogram() if the data we write is small (say, 4 bytes or less). This has the downside of not really helping for cases where the data is between 5 bytes and 4k. We can further tune it to use reprogram for, say, 256 bytes or less, and full page for anything bigger. This adds a bit of complexity to the code, but is reasonably straightforward.

Another option is to use an in-ram cache, say, 2kb or so. If a subsequent write is within those 2kb, we don't write to flash, but to memory only. If it falls outside of that chunk, we commit it to flash, and then start afresh with a moved cache. An explicit commit() will, of course, write it all out. This has the advantage of not needing much tuning at all, being faster than the previous method, but it uses 2kb extra ram, and feels a little more complex. To reduce complexity, we can cache a page, and auto-commit whenever a subsequent write would require another page, this makes bookeeping simpler.

A third option is to defer the decision to the library user: offer a 4-byte write, and a full-page write as bulkWrite(). This keeps the code super simple, but doesn't help EEPROM, which will always use bulkWrite for put() due to having to adhere to a particular API. Mind you, for the Kaleidoscope use case, this is perfectly fine, as we use .update() in parts where the write speed problems would manifest. There are still a few cases where we use put() for tiny (~4-16 bytes) of data, but we can change those to use .update() instead.

I'm leaning towards the third option, because that's the simplest. We still have a few corner cases, because we have to write 4-bytes, even if our data is smaller. As such, we need to handle the case where we'd span bank boundaries, and split the write in that case. That's still less complexity, and certainly less bookkeeping than the other options.

algernon commented 2 years ago

Pushed another update, it adds FlashStorage.writeSmall(), to write at most 4 bytes. When crossing bank boundaries, it falls back to writeBulk for simplicity, as it should be a very rare corner case not worth the extra logic.

EEPROM has been updated to use .writeSmall() for its write() function, and variants for 16 and 32 bit uints have been added too, to go along with the existing uint8_t variant. Similarly, EEPROM.put() has 3 new specializations, for 8/16/32-bit uints that use writeSmall instead of writeBulk.

This should result in much better speed in most cases Kaleidoscope cares about. Still haven't tested on hardware yet, mind you.

algernon commented 2 years ago

Having looked at how SAMD51's FlashStorage does things, there's another approach: we can just use a variable to reserve space.

Doing so, has a couple of advantages: we don't need to specially augment the linker in any way to make sure there's enough free space. It also allows us to have multiple storage objects at the same time, in different places, with different characteristics and whatnot.

The downside, however, is that it will be overwritten when flashing, so any flashing process must also include dumping all storages and writing them back.

Our current implementation allows one storage only, at the end of the flash area, and requires compiler augmenting to make sure there's space. But it does not get overwritten during flash, not unless our new program reaches into the old storage space, something which should happen under common circumstances.

The AVR EEPROM library allows one storage only, and so does the SAMD FlashStorage library's EEPROM emulation (but it's FlashStorage is more flexible). So far, my implementation of FlashStorage was made for the EEPROM (or EEPROM-like) use-case, where a single storage is good enough. I'd stick with that. But I thought it worthwhile to point out that there's a different way to implement it, with different trade-offs.

algernon commented 2 years ago

Another small update: I moved EEPROMClass (the EEPROM emulation layer) from the EEPROM library to FlashStorage, as FlashAsEEPROM.h. This way the EEPROM library is a few-line header that simply instantiates EEPROMClass to the default size.

This allows us to reuse EEPROMClass in other libraries, without having an instantiated EEPROM object around, and still have everything function in an Arduino-esque way when including EEPROM.h.

algernon commented 2 years ago

Collapsed into a single commit, and added a short README. Leaving it as draft still, 'cos I've yet to test this on hardware still.

maxgerhardt commented 2 years ago

I see you've added -DARDUINO_UPLOAD_MAXIMUM_SIZE={upload.maximum_size} to the Arduino IDE build flags, you can mirror that in the tools/platformio/platformio-build.py by expanding the CPPDEFINES array (line 214) with the same element:

    CPPDEFINES=[
        series,
        ("ARDUINO", 10808),
        ("F_CPU", "$BOARD_F_CPU"), # for compatiblity
        "ARDUINO_ARCH_GD32",
        "ARDUINO_%s" % board_id,
        ("BOARD_NAME", '\\"%s\\"' % board_id),
        ("ARDUINO_UPLOAD_MAXIMUM_SIZE", board_config.get("upload.maximum_size"))
    ],

algernon commented 2 years ago

Had a good night's sleep, and it dawned on me that we do not need to always position the storage at the end of flash, restricting us to one instance of the class. We can default to that, but still allow setting a position, with a template argument. We can have multiple storages this way, with reasonable defaults.

The advantage of not overwriting it (at least under normal circumstances) when flashing new firmware still remains. But the disadvantage of having to augment the linker to reserve space also remains. I still believe that's a fair compromise.

algernon commented 2 years ago

Pushed the latest version. As it turns out, we can't use fmc_word_reprogram to speed up writes, and using full page read/update/write cycles for small data is prohibitively slow. For this reason, we use a buffer of the size of the full storage: writes update this, read reads from this, and we only write it back to flash when calling .commit(). We also fill up the buffer with flash storage contents during begin().

I've tested this on a starter board, and everything seems to work fine, commit speed (at 10k storage) is perfectly fine, too. I still need to update the docs, however.

algernon commented 2 years ago

Squashed it all together and force pushed, everything should be up to date now, so I'm marking this as ready for review.

For reference, committing 1k to storage is about 6.8ms, perfectly good speed, considering commits should be reasonably rare.

obra commented 2 years ago

Looks good to me. @maxgerhardt - anything in particular you want to check or review before merge?

maxgerhardt commented 2 years ago

I was going to code review + test on real HW this weekend more in-depth. Especially taking into account the correctness of both libraries, possibly needed changes to the linkerscript to reserve space, possible menu selection on how much space is reserved in the Arduino IDE (and platformio.ini option for PlatformIO), how much sense it makes to have the FlashStorageLibrary along with the EEPROM library due to how they interwork in a (hopefully?) compatible way, how efficient the FlashStorageLibrary is, ...

algernon commented 2 years ago

Apologies for the late reply!

how much sense it makes to have the FlashStorageLibrary along with the EEPROM library due to how they interwork in a (hopefully?) compatible way, how efficient the FlashStorageLibrary is, ...

I'd say none, mostly because EEPROM creates a storage instance when included. If they were in the same library, that library would either be FlashStorage or EEPROM. If it's the former, then to use eeprom, one would have to include both headers for Arduino to find EEPROM, which isn't what happens on most platforms. If it was in the latter, then one would have to include EEPROM.h, thus creating a storage instance implicitly, which is not always the desired behaviour. So having them separate is the most practical solution, in my opinion.

possibly needed changes to the linkerscript to reserve space, possible menu selection on how much space is reserved in the Arduino IDE (and platformio.ini option for PlatformIO)

Not sure about PIO, but for Arduino, we might not even need a linker script. We can just create menu items that override the maximum upload size, to be the real storage size - reserved, and arduino will error out if that's reached. So we do the reserve a few levels higher than the linker, with a possibly more descriptive error message.

maxgerhardt commented 2 years ago

I see. I have to yet look over this, but hopefully I can get it done this weekend.

maxgerhardt commented 2 years ago

@algernon Alright I took a look at the stuff:

I had to correct multiple compiler errors and warnings to get it to compile, predominantly FlashStorage::write() with const uint8_t*, otherwise due to a -Werror const type conversion it would fail compilation
Added EEPROM::commit() function exposing the underlying storage_.commit(). Without that, users of only the EEPROM.h library could never actually "flush" the changes to the actual Flash, only write into the RAM buffer
Added a erase + write + read example in the EEPROM library

The library works fine on my GD32F303CC. However, I see the following issues:

The FlashStorage class and EEPROM class are architecturally weird. In the reference library (https://github.com/cmaglie/FlashStorage/tree/master/src), the FlashStorage class's write() function directly writes to the flash, has no bufferung, and the EEPROM class has the logic to only write into a buffer and flush / .commit() changes to the flash. The implementation here is exactly reversed: The FlashStorage's write() function only writes to a buffer it manages and has a .commit() function, the EEPROM class inherits all this without having to do its own buffering logic. Do we want to keep this design or equalize it with the other library? Or keep it for convience, but then always impose the RAM buffer overhead on users only using the FlashStorage class?
Code like static constexpr uint32_t bank0_end = fmc_base_address + 512 * 1024 - 1; probably only works for the GD32F30x series, which is the only series for which this core has full support -- for future series support, we need to keep in mind that this may change. Seems okay for now.
There are no safe-guards in the linker scripts that would prevent an application to be written that fills up the entire flash and still uses EEPROM which would then overwrite its own application code or data -- the maximum upload size can e.g. be restricted or reserved statically

I'm merging as of now since this is working, the rest can be addressed in later PRs or tracked in issues.

algernon commented 2 years ago

@algernon Alright I took a look at the stuff:

I had to correct multiple compiler errors and warnings to get it to compile, predominantly FlashStorage::write() with const uint8_t*, otherwise due to a -Werror const type conversion it would fail compilation

Sorry about that, I forgot to enable -Werror, so didn't see those.

The FlashStorage class and EEPROM class are architecturally weird. In the reference library (https://github.com/cmaglie/FlashStorage/tree/master/src), the FlashStorage class's write() function directly writes to the flash, has no bufferung, and the EEPROM class has the logic to only write into a buffer and flush / .commit() changes to the flash. The implementation here is exactly reversed: The FlashStorage's write() function only writes to a buffer it manages and has a .commit() function, the EEPROM class inherits all this without having to do its own buffering logic. Do we want to keep this design or equalize it with the other library? Or keep it for convience, but then always impose the RAM buffer overhead on users only using the FlashStorage class?

I'd rather keep the current architecture, because it allows users of the FlashStorage class to have it be efficient, without having to reimplement the buffering EEPROM does. On the other hand, I do see value in having a raw class without the buffering. So how about a compromise? FlashStorage would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage on top of that, that does the buffering, and adjust EEPROM to use BufferedFlashStorage.

This way we'd satisfy all three scenarios: flash storage without buffering, with buffering, and EEPROM emulation on top of the latter.

Code like static constexpr uint32_t bank0_end = fmc_base _address + 512 * 1024 - 1; probably only works for the GD32F30x series, which is the only series for which this core has full support -- for future series support, we need to keep in mind that this may change. Seems okay for now.

Suppose we can add a TODO comment above it?

maxgerhardt commented 2 years ago

FlashStorage would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage on top of that, that does the buffering, and adjust EEPROM to use BufferedFlashStorage.

But wouldn't the EEPROM class then be just a duplication of BufferedFlashStorage? Users that want this can directly use the global EEPROM object. Otherwise, I'm all of for having FlashStorage do direct read/writes and EEPROM doing buffering, just like the reference EEPROM and FlashStorage libraries, just out of principle. But it works fine now too.

Suppose we can add a TODO comment above it?

Yes that would be good

algernon commented 2 years ago

FlashStorage would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage on top of that, that does the buffering, and adjust EEPROM to use BufferedFlashStorage.

But wouldn't the EEPROM class then be just a duplication of BufferedFlashStorage?

The EEPROMClass class would wrap BufferedFlashStorage in this case. Not a duplication, but a thin wrap, exactly like how EEPROMClass in the current state is a thin wrap around FlashStorage. The only change to EEPROMClass in the proposed scenario would be s/FlashStorage/BufferedFlashStorage/.

Users that want this can directly use the global EEPROM object.

The reason I don't like the buffering being in EEPROMClass is that would make it harder to have a storage object with buffering that does not use the EEPROM APIs, nor the global object (because they want the storage at a different place, or they want multiple storage spaces, etc).

Having a FlashStorage -> BufferedFlashStorage -> EEPROMClass chain allows more flexibility, imo. We'd have direct read/write with FlashStorage, the current buffered behaviour with BufferedFlashStorage, and an EEPROM API wrapper with EEPROMClass (+ the EEPROM global object via the EEPROM library). Everyone can pick whichever class fits their goal best. Could even make EEPROMClass templateable, so one can create an instance backed by FlashStorage, or one backed by BufferedFlashStorage (with the global EEPROM object using the latter).

While this is different than the reference implementation, I think the increased flexibility is worth being different.

maxgerhardt commented 2 years ago

I see. Indeed, this would a good further improvement then.

CommunityGD32Cores / ArduinoCore-GD32

FlashStorage & EEPROM libraries #35