Closed algernon closed 2 years ago
According to the GD32F30x user manual: "The flash page size is 2KB for bank0, 4KB for bank1", with bank0 being the first 512kb. If we use both, we'd need to keep track of two page sizes, an additional complexity. For this reason, I'd restrict the libraries for bank0 only, and hard-code 2k as the page size. If and when we need more than 512kb for this kind of storage, we'll figure something out then.
Okay... maybe 2k is not the right choice, and it does need to be set in boards.txt
. If we use a chip that supports more than 512kb of flash, we ideally want this storage at the end, meaning we'll be in the 4kb-page bank. If we use a chip with <= 512kb, we'll be in bank0 with 2kb page sizes.
Just some general thoughts on this:
From the GD32F303xx datasheet:
So for the GD32F303xx series, the higher the total flash amount, the usable size for code stays the same, just additional space for data storage is added. If the library could only handle Bank0 (up to 512kB total flash), everything starting at CG couldn't be taken proper advantage of. The library should be able to quickly discern whether an address is in a 2kB or 4Kb page because it can subtract the address from the 0x800000 start (to get the offset from flash start), if the offset is less or equal to 512*1024 it's in the Bank0 region and thus 2Kb, else 4Kb.
We must also note that the linker scripts need to be adapted for this, so for the e.g. CC device we can reserve some (user-selectable at best) last sectors for data flash, that must then be subtracted from the available flash for code in the linker script. If a dynamic user selection happens via menus, that must also be mirrored to the PlatformIO builder scripts.
For a brief moment, I pondered about not caring about pages, and just using fmc_word_reprogram()
, 'cos we don't need to erase pages then. But we'd still need to page-align writes, so that's out.
So for the GD32F303xx series, the higher the total flash amount, the usable size for code stays the same, just additional space for data storage is added.
AHA! Thanks, this makes a few things easier.
The library should be able to quickly discern whether an address is in a 2kB or 4Kb page because it can subtract the address from the 0x800000 start (to get the offset from flash start), if the offset is less or equal to 512*1024 it's in the Bank0 region and thus 2Kb, else 4Kb.
Mhm. I think I found a reasonably nice way to do just that. Thanks!
We must also note that the linker scripts need to be adapted for this, so for the e.g. CC device we can reserve some (user-selectable at best) last sectors for data flash, that must then be subtracted from the available flash for code in the linker script.
For now, I don't think we need dynamic user-selection of storage size, just hard-coding a value for each device in boards.txt should suffice, and we push that down to the linker. Makes things simpler in the beginning, and we can always iterate from that at a future date.
Pushed another variant. This assumes that all data storage will be used for FlashStorage (and consequently EEPROM), and relies on the system parts to define FMC_BANK0_SIZE
to figure out where the bank0/bank1 split is. It doesn't do any sanity checking whether there's even a bank1, or if we're out of bounds (either in bank0 or bank1).
This is probably wrong, I'll be doing some more reading & refactoring.
As a note, @howitzer74 confirmed that the datasheet is misleading about 'program' and 'data' storage. The extended storage area is a hair slower than the main storage area, but there is no actual restriction on what can go where. You can fill flash with a program.
A'ight. I'll take a different approach then: the storage area will end at flash end, and we'll use a custom size for it (via a template argument). Will still support both banks, but the library will prefer bank1, 'cos it's nearer the end.
There will be no default, global EEPROM
object instantiated, users of the library will have to set up the size, and adjust their build scripts and whatnot appropriately, if needed. This allows the greatest flexibility for the library.
On Wed, Aug 25, 2021, at 12:00 PM, Gergely Nagy wrote:
A'ight. I'll take a different approach then: the storage area will end at flash end, and we'll use a custom size for it (via a template argument). Will still support both banks, but the library will prefer bank1, 'cos it's nearer the end.
I suspect we should default to a fixed size that's comparable to other arduino implementations, rather than sized to the entire bank....what do other implementations do?
There will be no default, global
EEPROM
object instantiated, users of the library will have to set up the size, and adjust their build scripts and whatnot appropriately, if needed. This allows the greatest flexibility for the library.— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CommunityGD32Cores/ArduinoCore-GD32/pull/35#issuecomment-905794000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2FWJORWZVT3IGKEKZDT6U4UNANCNFSM5CZHRFKQ.
I suspect we should default to a fixed size that's comparable to other arduino implementations, rather than sized to the entire bank....
Oh, I didn't mean to use the entire bank going forward. I only used the entire data area because the docs suggested it can't be used for program code, so made sense to use it all for storage. (At least as a first iteration)
Now that I know program code can go there too, not going to use an entire bank, no.
what do other implementations do?
FlashStorage uses an 1k default for the EEPROM
object, and requires the library user to declare the flash area when using the library without the EEPROM emulation. To override that 1k, one needs to either modify the source, or compile with -DEEPROM_EMULATION_SIZE=x
. I'd rather have the size modifiable without having to pass compiler flags.
To do that, I see two approaches: one is to not have an EEPROM
object pre-defined in the library, but require library users to initialize one, with the desired size set at compile time via template arguments. In this case, there simply is no default. The other approach is to have a default (I'd suppose something bigger than 1k, though, perhaps 10k?), overrideable with -DEEPROM_EMULATION_SIZE=x
, but in addition to that, the EEPROM
object does not compile if -DWITHOUT_DEFAULT_EEPROM
is set, so devices that wish to allow setting the size in other ways (but still at compile time), can do so. This'd make it easier to use Kaleidoscope's DeviceProps
to set the size, rather than a less discoverable boards.txt
snippet.
On second thought: I can just go with EEPROM
being always present, and defaulting to 1k/10k/whatever, and I can build Kaleidoscope's driver around FlashStorage
, rather than EEPROM
(the latter being a very thin wrapper anyway).
Another update, this is much closer to what I think I'd be happy with, and is considerably simpler and more concise than the previous versions.
It positions the data at the end of flash, growing back from there, to a custom size. The FlashStorage
class is template-able, and requires the custom size as template argument. EEPROM
is not templated, and used EEPROM_EMULATION_SIZE
for the storage size, which defaults to 10k. The storage space may cross the bank0/bank1 boundary, the library doesn't care, will figure out the correct page sizes appropriately.
The size of flash is determined by the upload.maximum_size
parameter in boards.txt
, it is passed to the build process via platform.txt
's build.info.flags
. I'm not familiar with platformio, so help in getting the platformio build script updated would be most appreciated.
This way, we have sensible - but overrideable - defaults for EEPROM
, and it works in an Arduino-esque way. We also have FlashStorage
for cases where EEPROM
is not the most practical way to implement the storage.
There are still no sanity checking involved, the library will happily try to read and write from strange places, it is up to the library user to make sure addresses are valid.
There are still no sanity checking involved, the library will happily try to read and write from strange places, it is up to the library user to make sure addresses are valid.
I'm sure you intend to address this before we merge, but especially given that this could mean the bootloader could get clobbered, we should probably make sure the EEPROM library has good protections on writes.
I can - and intend to, indeed - make it handle out of bounds operations, both read and write. It has all the data it needs to do that, and is easy to check. Question is: what shall we do in out of bounds cases? Should read
and write
return an error value? Should they just turn no-op?
For EEPROM
, we can't always return an error value, not with EEPROM.get()
or EEPROM.put()
, which return templated stuff. In the EEPROM
case, I think the best we can do is no-op on out-of-bounds operations.
On Wed, Aug 25, 2021, at 3:28 PM, Gergely Nagy wrote:
I can - and intend to, indeed - make it handle out of bounds operations, both read and write. It has all the data it needs to do that, and is easy to check. Question is: what shall we do in out of bounds cases? Should
read
andwrite
return an error value? Should they just turn no-op?What do other cores' EEPROM implementations do? For
EEPROM
, we can't always return an error value, not withEEPROM.get()
orEEPROM.put()
, which return templated stuff. In theEEPROM
case, I think the best we can do is no-op on out-of-bounds operations.— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CommunityGD32Cores/ArduinoCore-GD32/pull/35#issuecomment-905918656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2BIGEP747XVFAUPGZDT6VVBXANCNFSM5CZHRFKQ.
What do other cores' EEPROM implementations do?
SAMD51's FlashStorage doesn't do bounds checking. It uses a RAM copy of the EEPROM, and will happily read and write out of bounds. However, that only corrupts data in memory, and will not overwrite anything in flash, because it only writes there on commit()
, and that's always within bounds.
For AVR... the library itself does not appear to do any bounds checking, and relies on avr-libc. I've looked at avr-libc, and... these particular functions, eeprom_read_byte()
and eeprom_write_byte()
are in assembly, and I'm not familiar with that. But at a first glance, it doesn't seem to do any bounds checking, either.
In any case, if we follow the existing EEPROM
API from Arduino, we can't return errors. We can only no-op, which is what my latest push did.
There's one major issue I want to address: write speed of small data. Right now, we write full pages only, even if we change only a couple of bytes. This isn't a big deal if we do this rarely. However, in Kaleidoscope, most of our writes are between 1-4 bytes a time. Even for larger amounts of data, we write 1-4 bytes at a time in a tight loop.
For example, for a 3-layer keymap, we write 3*64*2=384 bytes, continuously, one byte at a time. With the current FlashStorage
implementation, that's likely going to result in reading a 4kb page 384 times, changing a single byte, and writing the full 4kb page back. Even on a fast chip, that's going to be very inefficient.
I can imagine a couple of ways to improve it, all with their own drawbacks:
We could augment write()
to use fmc_word_reprogram()
if the data we write is small (say, 4 bytes or less). This has the downside of not really helping for cases where the data is between 5 bytes and 4k. We can further tune it to use reprogram for, say, 256 bytes or less, and full page for anything bigger. This adds a bit of complexity to the code, but is reasonably straightforward.
Another option is to use an in-ram cache, say, 2kb or so. If a subsequent write is within those 2kb, we don't write to flash, but to memory only. If it falls outside of that chunk, we commit it to flash, and then start afresh with a moved cache. An explicit commit()
will, of course, write it all out. This has the advantage of not needing much tuning at all, being faster than the previous method, but it uses 2kb extra ram, and feels a little more complex. To reduce complexity, we can cache a page, and auto-commit whenever a subsequent write would require another page, this makes bookeeping simpler.
A third option is to defer the decision to the library user: offer a 4-byte write, and a full-page write as bulkWrite()
. This keeps the code super simple, but doesn't help EEPROM
, which will always use bulkWrite
for put()
due to having to adhere to a particular API. Mind you, for the Kaleidoscope use case, this is perfectly fine, as we use .update()
in parts where the write speed problems would manifest. There are still a few cases where we use put()
for tiny (~4-16 bytes) of data, but we can change those to use .update()
instead.
I'm leaning towards the third option, because that's the simplest. We still have a few corner cases, because we have to write 4-bytes, even if our data is smaller. As such, we need to handle the case where we'd span bank boundaries, and split the write in that case. That's still less complexity, and certainly less bookkeeping than the other options.
Pushed another update, it adds FlashStorage.writeSmall()
, to write at most 4 bytes. When crossing bank boundaries, it falls back to writeBulk
for simplicity, as it should be a very rare corner case not worth the extra logic.
EEPROM
has been updated to use .writeSmall()
for its write()
function, and variants for 16 and 32 bit uints have been added too, to go along with the existing uint8_t variant. Similarly, EEPROM.put()
has 3 new specializations, for 8/16/32-bit uints that use writeSmall
instead of writeBulk
.
This should result in much better speed in most cases Kaleidoscope cares about. Still haven't tested on hardware yet, mind you.
Having looked at how SAMD51's FlashStorage does things, there's another approach: we can just use a variable to reserve space.
Doing so, has a couple of advantages: we don't need to specially augment the linker in any way to make sure there's enough free space. It also allows us to have multiple storage objects at the same time, in different places, with different characteristics and whatnot.
The downside, however, is that it will be overwritten when flashing, so any flashing process must also include dumping all storages and writing them back.
Our current implementation allows one storage only, at the end of the flash area, and requires compiler augmenting to make sure there's space. But it does not get overwritten during flash, not unless our new program reaches into the old storage space, something which should happen under common circumstances.
The AVR EEPROM library allows one storage only, and so does the SAMD FlashStorage library's EEPROM emulation (but it's FlashStorage is more flexible). So far, my implementation of FlashStorage was made for the EEPROM (or EEPROM-like) use-case, where a single storage is good enough. I'd stick with that. But I thought it worthwhile to point out that there's a different way to implement it, with different trade-offs.
Another small update: I moved EEPROMClass
(the EEPROM emulation layer) from the EEPROM
library to FlashStorage
, as FlashAsEEPROM.h
. This way the EEPROM
library is a few-line header that simply instantiates EEPROMClass
to the default size.
This allows us to reuse EEPROMClass
in other libraries, without having an instantiated EEPROM
object around, and still have everything function in an Arduino-esque way when including EEPROM.h
.
Collapsed into a single commit, and added a short README. Leaving it as draft still, 'cos I've yet to test this on hardware still.
I see you've added -DARDUINO_UPLOAD_MAXIMUM_SIZE={upload.maximum_size}
to the Arduino IDE build flags, you can mirror that in the tools/platformio/platformio-build.py
by expanding the CPPDEFINES
array (line 214) with the same element:
CPPDEFINES=[
series,
("ARDUINO", 10808),
("F_CPU", "$BOARD_F_CPU"), # for compatiblity
"ARDUINO_ARCH_GD32",
"ARDUINO_%s" % board_id,
("BOARD_NAME", '\\"%s\\"' % board_id),
("ARDUINO_UPLOAD_MAXIMUM_SIZE", board_config.get("upload.maximum_size"))
],
Had a good night's sleep, and it dawned on me that we do not need to always position the storage at the end of flash, restricting us to one instance of the class. We can default to that, but still allow setting a position, with a template argument. We can have multiple storages this way, with reasonable defaults.
The advantage of not overwriting it (at least under normal circumstances) when flashing new firmware still remains. But the disadvantage of having to augment the linker to reserve space also remains. I still believe that's a fair compromise.
Pushed the latest version. As it turns out, we can't use fmc_word_reprogram
to speed up writes, and using full page read/update/write cycles for small data is prohibitively slow. For this reason, we use a buffer of the size of the full storage: writes update this, read reads from this, and we only write it back to flash when calling .commit()
. We also fill up the buffer with flash storage contents during begin()
.
I've tested this on a starter board, and everything seems to work fine, commit speed (at 10k storage) is perfectly fine, too. I still need to update the docs, however.
Squashed it all together and force pushed, everything should be up to date now, so I'm marking this as ready for review.
For reference, committing 1k to storage is about 6.8ms, perfectly good speed, considering commits should be reasonably rare.
Looks good to me. @maxgerhardt - anything in particular you want to check or review before merge?
I was going to code review + test on real HW this weekend more in-depth. Especially taking into account the correctness of both libraries, possibly needed changes to the linkerscript to reserve space, possible menu selection on how much space is reserved in the Arduino IDE (and platformio.ini
option for PlatformIO), how much sense it makes to have the FlashStorageLibrary along with the EEPROM library due to how they interwork in a (hopefully?) compatible way, how efficient the FlashStorageLibrary is, ...
Apologies for the late reply!
how much sense it makes to have the FlashStorageLibrary along with the EEPROM library due to how they interwork in a (hopefully?) compatible way, how efficient the FlashStorageLibrary is, ...
I'd say none, mostly because EEPROM
creates a storage instance when included. If they were in the same library, that library would either be FlashStorage
or EEPROM
. If it's the former, then to use eeprom, one would have to include both headers for Arduino to find EEPROM
, which isn't what happens on most platforms. If it was in the latter, then one would have to include EEPROM.h
, thus creating a storage instance implicitly, which is not always the desired behaviour. So having them separate is the most practical solution, in my opinion.
possibly needed changes to the linkerscript to reserve space, possible menu selection on how much space is reserved in the Arduino IDE (and platformio.ini option for PlatformIO)
Not sure about PIO, but for Arduino, we might not even need a linker script. We can just create menu items that override the maximum upload size, to be the real storage size - reserved, and arduino will error out if that's reached. So we do the reserve a few levels higher than the linker, with a possibly more descriptive error message.
I see. I have to yet look over this, but hopefully I can get it done this weekend.
@algernon Alright I took a look at the stuff:
FlashStorage::write()
with const uint8_t*, otherwise due to a -Werror
const type conversion it would fail compilationEEPROM::commit()
function exposing the underlying storage_.commit()
. Without that, users of only the EEPROM.h
library could never actually "flush" the changes to the actual Flash, only write into the RAM bufferThe library works fine on my GD32F303CC. However, I see the following issues:
FlashStorage
class and EEPROM
class are architecturally weird. In the reference library (https://github.com/cmaglie/FlashStorage/tree/master/src), the FlashStorage
class's write()
function directly writes to the flash, has no bufferung, and the EEPROM
class has the logic to only write into a buffer and flush / .commit()
changes to the flash. The implementation here is exactly reversed: The FlashStorage
's write()
function only writes to a buffer it manages and has a .commit()
function, the EEPROM class inherits all this without having to do its own buffering logic. Do we want to keep this design or equalize it with the other library? Or keep it for convience, but then always impose the RAM buffer overhead on users only using the FlashStorage
class?static constexpr uint32_t bank0_end = fmc_base_address + 512 * 1024 - 1;
probably only works for the GD32F30x series, which is the only series for which this core has full support -- for future series support, we need to keep in mind that this may change. Seems okay for now.I'm merging as of now since this is working, the rest can be addressed in later PRs or tracked in issues.
@algernon Alright I took a look at the stuff:
- I had to correct multiple compiler errors and warnings to get it to compile, predominantly
FlashStorage::write()
with const uint8_t*, otherwise due to a-Werror
const type conversion it would fail compilation
Sorry about that, I forgot to enable -Werror, so didn't see those.
- The
FlashStorage
class andEEPROM
class are architecturally weird. In the reference library (https://github.com/cmaglie/FlashStorage/tree/master/src), theFlashStorage
class'swrite()
function directly writes to the flash, has no bufferung, and theEEPROM
class has the logic to only write into a buffer and flush /.commit()
changes to the flash. The implementation here is exactly reversed: TheFlashStorage
'swrite()
function only writes to a buffer it manages and has a.commit()
function, the EEPROM class inherits all this without having to do its own buffering logic. Do we want to keep this design or equalize it with the other library? Or keep it for convience, but then always impose the RAM buffer overhead on users only using theFlashStorage
class?
I'd rather keep the current architecture, because it allows users of the FlashStorage
class to have it be efficient, without having to reimplement the buffering EEPROM does. On the other hand, I do see value in having a raw class without the buffering. So how about a compromise? FlashStorage
would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage
on top of that, that does the buffering, and adjust EEPROM
to use BufferedFlashStorage
.
This way we'd satisfy all three scenarios: flash storage without buffering, with buffering, and EEPROM emulation on top of the latter.
- Code like
static constexpr uint32_t bank0_end = fmc_base _address + 512 * 1024 - 1;
probably only works for the GD32F30x series, which is the only series for which this core has full support -- for future series support, we need to keep in mind that this may change. Seems okay for now.
Suppose we can add a TODO
comment above it?
FlashStorage would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage on top of that, that does the buffering, and adjust EEPROM to use BufferedFlashStorage.
But wouldn't the EEPROM class then be just a duplication of BufferedFlashStorage
? Users that want this can directly use the global EEPROM
object. Otherwise, I'm all of for having FlashStorage
do direct read/writes and EEPROM doing buffering, just like the reference EEPROM and FlashStorage libraries, just out of principle. But it works fine now too.
Suppose we can add a TODO comment above it?
Yes that would be good
FlashStorage would become a class that has no buffering, does direct writes and reads. We'd build BufferedFlashStorage on top of that, that does the buffering, and adjust EEPROM to use BufferedFlashStorage.
But wouldn't the EEPROM class then be just a duplication of
BufferedFlashStorage
?
The EEPROMClass
class would wrap BufferedFlashStorage
in this case. Not a duplication, but a thin wrap, exactly like how EEPROMClass
in the current state is a thin wrap around FlashStorage
. The only change to EEPROMClass
in the proposed scenario would be s/FlashStorage/BufferedFlashStorage/
.
Users that want this can directly use the global
EEPROM
object.
The reason I don't like the buffering being in EEPROMClass
is that would make it harder to have a storage object with buffering that does not use the EEPROM
APIs, nor the global object (because they want the storage at a different place, or they want multiple storage spaces, etc).
Having a FlashStorage -> BufferedFlashStorage -> EEPROMClass chain allows more flexibility, imo. We'd have direct read/write with FlashStorage
, the current buffered behaviour with BufferedFlashStorage
, and an EEPROM API wrapper with EEPROMClass
(+ the EEPROM
global object via the EEPROM library). Everyone can pick whichever class fits their goal best. Could even make EEPROMClass
templateable, so one can create an instance backed by FlashStorage
, or one backed by BufferedFlashStorage
(with the global EEPROM object using the latter).
While this is different than the reference implementation, I think the increased flexibility is worth being different.
I see. Indeed, this would a good further improvement then.
This is a work-in-progress version of my attempt at a pair of FlashStorage & EEPROM libraries.
FlashStorage
is a lower-level library that abstracts away (most of) the burden of storing and retrieving user data from flash. It simply provides aread()
and awrite()
method, that handle all the pageing, erasing and updating and whatnot behind the scenes, so the end user doesn't have to care about the low level details.EEPROM
builds on top of that, and provides an Arduino-esqueEEPROMClass
, with an API similar to the AVREEPROM
. It's more limited than that, because only the basic functionality required for Kaleidoscope is implemented at this point (no C+11 iterators or array access at this point).There are a few things I need to work out still, namely:
05_USBD_MSC_internal_flash
. Ideally, we'd have those defaults come fromboards.txt
. The major things we need is the size of the storage, where it starts (or where it ends, which is more practical to store in boards.txt in our case), and the page size.For reads, we can be much more efficient, because we don't need any paging there. We just read N byes of data from an address, and done.
For writes, the most efficient way would be to copy the whole storage to RAM, and only write back on
commit()
. That'd make writes a lot more efficient, at the cost of ram. Another option would be to have aprepareWrite()
function, that tells the library how much data we'll write, and where, so it'll prep that area in RAM, and write it all back incommit()
. This, however, would require dynamic memory allocation. There'sfmc_word_reprogram()
, which should - judging by its comments - reprogram a word at an address without erase. We could use that for more efficient small writes, as we won't have to do full page erase + writes. However, any of these require more thought, and I'll postpone efficient writes 'till a follow-up PR instead. Lets get the very basic functionality working in this one first.Meanwhile, I'm submitting this as a draft PR for comments, and perhaps ideas. The code has not been tested on hardware yet, I've only done compile tests with this iteration. Once I got to the "test on hardware" stage, I'll report back too.