esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.97k stars 13.34k forks source link

SPIFFS file access slow on 16/14M flash config #5932

Closed TD-er closed 4 years ago

TD-er commented 5 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

Open file from SPIFFS is quite a bit slower when using 16M/14M flash layout, compared to the same code running on the same node with 4M/1M as flash layout.

I am running a test setup on ESPeasy using the full 16M flash and thus 14M SPIFFS (want to keep using OTA). Serving any web page is notably slower (1.7 sec compared to 300 msec). After some debugging, it became clear that opening files from the SPIFFS was taking a lot longer.

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
Load File 4M   16712 2.40 1.139 3.167 23.190
Load File 16M   8 0.01 1.329 93.900 248.736

And indeed, I do some checks for files on the SPIFFS while serving the web pages. These checks are not really that efficient (try open instead of check first if it exists), so I will change that.

But still the access times to the SPIFFS are significantly slower when using 16M flash. In PlatformIO I use:

board                     = esp12e
build_flags               = -Wl,-Tesp8266.flash.16m14m.ld

This behaves the same when using board = d1_mini_pro as suggested here

These are the detected flash settings:

Param value
Flash Chip ID: Vendor: 0xEF Device: 0x4018
Flash Chip Real Size: 16384 kB
Flash IDE Size: 16384 kB
Flash IDE Speed: 40 MHz
Flash IDE Mode: DIO
Flash Writes: 6 daily / 6 boot
Sketch Size: 946 kB (1100 kB free)
SPIFFS Size: 13579 kB (13504 kB free)
Page size: 256
Block size: 8192
Number of blocks: 1697
Maximum open files: 5
Maximum path length: 32

It is apparently using these settings: https://github.com/esp8266/Arduino/blob/38779149d07aecefc0e23d1452d38ee01d2437f8/tools/sdk/ld/eagle.flash.16m14m.ld#L17-L20

I have not yet tested to see if performance improves when increasing the block size. Increasing SPIFFS_MAX_OPEN_FILES to 20 doesn't seem to make any difference. (was hoping the FD cache would also increase)

Or maybe any other tweak suggestions here?

d-a-v commented 5 years ago

It may be time for LittleFS ?

TD-er commented 5 years ago

@d-a-v I guess so, but for the project I meant to use it for I have to make a choice on file system in the next few weeks. (2 months at most) I see that it is still in development for this framework. What is its current status? And how do I build it myself?

TD-er commented 5 years ago

Does anyone have a suggestion how I can increase the block size of this partition layout? I've been messing with this for way too long already and it keeps on resetting when I try to reduce the number of blocks.

In eagle.flash.16m14m.ld I changed the following lines:

PROVIDE ( _SPIFFS_end = 0x411F8000 );  <=== Lowered to get a integer multiple of block size
PROVIDE ( _SPIFFS_block = 0x8000 );

Original is: https://github.com/esp8266/Arduino/blob/68c0a1cc9e9427be10d1babdb994c43fed9c2271/tools/sdk/ld/eagle.flash.16m14m.ld#L17-L20

devyte commented 5 years ago

The slowness you describe is known and reported in #2581 . However, I'll keep this open to address the specifics of performance, given that the other thread has grown too long. I think you're on the right track: the slowness is most likely due to the huge size without adjusting page and block sizes. The underlying search algorithms are naive and I think do linear search over all blocks. It is one of the things pending to investigate and optimize these parameters for the bigger SPIFFS sizes. So no, no suggestions, because what you're doing is exactly what needs to be investigated. The parameters should allow sane values, but I'm not sure of the implications of having e.g. a block size bigger than a flash sector size. There might be changes required to how a SPIFFS block is written to flash, I.e. write multiple 4KB sectors instead of just a single write attempt. I don't really know, that's just a WAG. One thing you could test isthe go smaller to checkthe if it crashes, and test if 4KB is a threshold. SPIFFS is widely used in many projects at its lower api level, so there are a lot of examples out there. If you figure this out, it will be a huge help to finalize the support for pro boards. About LittleFS, I don't think it's a direct replacement, at least not in its current form, and not for all cases. At the very least, its wear leveling is not up to par with SPIFFS. It does seem to perform better, though, but again, the SPIFFS parameters haven't been ever fine tuned for performance.

TD-er commented 5 years ago

I have tried several builds myself but keep running into crash/reboot loops when trying to set the block size to 32kB. So I will stop trying for now unless someone can point me to the right direction where I can find some documentation on how it should be done.

For my own project, I will try to use the left over 12 MB of flash (using the 4M/1M config) as a single circular buffer and thus also avoid any other SPIFFS issue I may run into.

devyte commented 5 years ago

How about testing with bigger page size in the meantime? The impact of page size is also not known.

devyte commented 5 years ago

The underlying SPIFFS lib isn't ours, it comes from here: https://github.com/pellepl/spiffs

Docs are in their wiki. For our integration code, there are no docs, just some code comments.

TD-er commented 5 years ago

The .ld files are also from the spiffs project?

bill-orange commented 5 years ago

@d-a-v Can you use FTP with Little FS in the usual fashion?

d-a-v commented 5 years ago

@bill-orange This is a question for @earlephilhower. The answer is yes. Thanks to his recent work, every library using FS::or SPIFFS. can be used on SPIFFS, SDFS and also faster LITTLEFS to come with little-to-no changes.

earlephilhower commented 5 years ago

@bill-orange you can probably run the same FTP server library,. but if it hardcodes SPIFFS.open you'll need to replace with LittleFS.open to ensure the right FS is used. You'd also need to upload your files on chip with the LittleFS uploader if it's a FTP server.

earlephilhower commented 5 years ago

@TD-er, I don't have a 16M board but would be interested in hoe LittleFS V2 works with it and the settings used. Would you be able to run your testing w/LittleFS instead and report some results? We kind of assume it'll be better than SPIFFS, but hard data is way better...

TD-er commented 5 years ago

Sure, if you show me how to build it, I can test it. Or a pre-built test binary is also fine :)

earlephilhower commented 5 years ago

In your git tree it's pretty simple to do a branch (change upstream to origin if you're working off a clone of the master git and not a private fork):

git fetch upstream pull/5511/head:pr5511
git checkout pr5511
git submodule init
git submodule update

From then on, just search-and-replace SPIFFS with LittleFS (headers, SPIFFS.open, most likely only places).

earlephilhower commented 5 years ago

Oh, and of course you lose all data in flash since it's a different FS. If there's something critical on the board, don't do this...

TD-er commented 5 years ago

Ofcourse tests will be done on non remote nodes that do not need stairs and screwdrivers to reach :) Learned that the hard way ;)

TD-er commented 5 years ago

I prepared the framework like you explained and did a query replace on the SPIFFS to LittleFS like this:

grep -R SPIFFS *|grep -v ReleaseNotes|cut -d ':' -f1|sort -n|uniq|xargs -n1 -I{} sed -i 's/SPIFFS/LittleFS/g' {}

But I get errors like these:

/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasy.ino: In function 'void loop()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasy.ino:580:5: error: 'LittleFS' was not declared in this scope
LittleFS.end();
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRTC.ino: In member function 'String RTC_cache_handler_struct::getReadCacheFileName(int&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRTC.ino:292:11: error: 'LittleFS' was not declared in this scope
if (LittleFS.exists(fname)) {
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRTC.ino: In member function 'String RTC_cache_handler_struct::getPeekCacheFileName(bool&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRTC.ino:321:9: error: 'LittleFS' was not declared in this scope
if (LittleFS.exists(fname)) {
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRules.ino: In function 'void checkRuleSets()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRules.ino:43:9: error: 'LittleFS' was not declared in this scope
if (LittleFS.exists(fileName))
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRules.ino: In function 'void rulesProcessing(String&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyRules.ino:91:9: error: 'LittleFS' was not declared in this scope
if (LittleFS.exists(fileName))
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'bool fileExists(const String&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:57:10: error: 'LittleFS' was not declared in this scope
return LittleFS.exists(fname);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'fs::File tryOpenFile(const String&, const String&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:67:7: error: 'LittleFS' was not declared in this scope
f = LittleFS.open(fname, mode.c_str());
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'bool tryDeleteFile(const String&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:75:16: error: 'LittleFS' was not declared in this scope
bool res = LittleFS.remove(fname);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'void fileSystemCheck()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:147:7: error: 'LittleFS' was not declared in this scope
if (LittleFS.begin())
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'size_t SpiffsUsedBytes()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:875:3: error: 'LittleFS' was not declared in this scope
LittleFS.info(fs_info);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'size_t SpiffsTotalBytes()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:888:3: error: 'LittleFS' was not declared in this scope
LittleFS.info(fs_info);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'size_t SpiffsBlocksize()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:901:3: error: 'LittleFS' was not declared in this scope
LittleFS.info(fs_info);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'size_t SpiffsPagesize()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:914:3: error: 'LittleFS' was not declared in this scope
LittleFS.info(fs_info);
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino: In function 'bool getCacheFileCounters(uint16_t&, uint16_t&, size_t&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/ESPEasyStorage.ino:971:13: error: 'LittleFS' was not declared in this scope
Dir dir = LittleFS.openDir("cache");
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/Misc.ino: In function 'void ResetFactory()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/Misc.ino:1060:3: error: 'LittleFS' was not declared in this scope
LittleFS.end();
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/Misc.ino: In function 'void reboot()':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/Misc.ino:1565:3: error: 'LittleFS' was not declared in this scope
LittleFS.end();
^
/home/gijs/GitHub/letscontrolit/ESPEasy/src/WebServer.ino: In function 'void getWebPageTemplateVar(const String&)':
/home/gijs/GitHub/letscontrolit/ESPEasy/src/WebServer.ino:769:9: error: 'LittleFS' was not declared in this scope
[...]

So apparently I still need something defined somewhere? The includes are also replaced:

-  #include "SPIFFS.h"
+  #include "LittleFS.h"
TD-er commented 5 years ago

Ah, adding #include <LittleFS.h> to the top of ESPEasy.ino seems to work :) And I did some manual replaces too for _SPIFFS_ to _FS_

I will now upload it and start testing.

earlephilhower commented 5 years ago

Thanks, looks like we cross posted.

TD-er commented 5 years ago

OK, first conclusion: It works and is quite a bit faster compared to the usual build with SPIFFS. The file list page is now completely loaded in 145 msec while it took 290 msec on SPIFFS with 1M partition. (same node, same number of files)

Other pages show just a bit faster load times compared to the 1M SPIFFS, but at least not slower.

Saving some update to the settings file does take quite a bit more time (1.8 sec for updating part of a file)

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
Load File LittleFS   6 0.02 5.187 5.454 5.822
Save File LittleFS   5 0.02 1780.130 1793.657 1813.669
Load File SPIFFS   10 -0.00 1.040 4.727 30.410
Save File SPIFFS   5 -0.00 27.258 42.623 62.872

I will now include my new "cache controller", which does lots of appends to a file, to see how that will perform.

TD-er commented 5 years ago

One thing that's not looking right. The free space I compute does not seem to be right.

Label Value
Flash Chip ID: Vendor: 0xEF Device: 0x4018
Flash Chip Real Size: 16384 kB
Flash IDE Size: 16384 kB
Flash IDE Speed: 40 MHz
Flash IDE Mode: DIO
Flash Writes: 20 daily / 20 boot
Sketch Size: 963 kB (1084 kB free)
LittleFS Size: 14316 kB (96 kB free)
Page size: 256
Block size: 8192
Number of blocks: 1789
Maximum open files: 5
Maximum path length: 32

Edit: see https://github.com/esp8266/Arduino/pull/5511#pullrequestreview-228507083

TD-er commented 5 years ago

This "Cache Controller" is some controller collecting samples with 24 bytes in total per sample. It appends 10 samples at a time to a file, so 240 bytes in total appended. After 24k (1000 samples), it will start on the next file.

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
C_16_Cache Controller [LittleFS] CPLUGIN_PROTOCOL_SEND 1628 2.00 0.479 18.495 409.114
C_16_Cache Controller [SPIFFS, full filesystem] CPLUGIN_PROTOCOL_SEND 205 0.11 0.448 38.199 993.639
C_16_Cache Controller [SPIFFS, clean filesystem] CPLUGIN_PROTOCOL_SEND 35 0.12 0.452 3.601 88.560

Garbage Collection after deleting 24 files of 24k from full SPIFFS filesystem:

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
SPIFFS GC success   13 0.16 52.083 59.547 65.022
SPIFFS GC fail   23 0.29 9.971 11.032 12.550

Apparently there is no garbage collection on LittleFS?

About the free space. It seems like I'm computing the amount of used space.???? Free space is getting better after writing more files to it :)

TD-er commented 5 years ago

For those who want to test. ESPeasy_LittleFS_testBuild.rar It has a 4M and a 16M build inside and also the (highly experimental) cache controller.

In the RAR file is also a basic settings file, which you can copy to the filesystem after you setup the WiFi credentials etc. This configuration will start writing 2 samples per second of basic sysinfo to the cache controller. Thus writing a single 24000 bytes file takes roughly 500 seconds. To fill 14M at this pace will take about 3.3 days, so you may want to upload some more larger files to speed it up and see what happens when the filesystem is full.

On 1M SPIFFS I could store about 23 or 24 of these files. The rest was overhead due to filesystem fragmentation. But I guess this build may fail if the filesystem is full, since it cannot detect the amount of free space. It will start deleting older cache files as soon as writing fails, so maybe it will work.

Edit: It detects write issues very well, so older cache files are removed like they should. It looks like we can fill the entire filesystem before a write fails. That's really nice :)

earlephilhower commented 5 years ago

Thanks, @TD-er ! I just pushed a fix to the PR for the free space reporting, thanks for catching it.

There's no GC for LittleFS, at least not externally accessible by a user of the FS. It's not using the NOR specific tricks that SPIFFS is, so I think a lot of design decisions were different.

Can you give an idea of what SaveFile is doing? If it's really as bad as it looks, we can post something to the LittleFS repo (assuming I can develop a test case to reproduce it, that is). >1second seems like it should have WDT'd since I have neither yield() or delay() in the library.

TD-er commented 5 years ago

The WDT's are at 2 sec for the SW watchdog and 6 sec for the HW watchdog.

And I will have a look at it tomorrow to see what's wrong there. I may have an idea about what may cause this. Perhaps writing per byte, which is perfectly fine for SPIFFS, since that's caching stuff, but maybe LittleFS isn't. And if it is indeed that routine, then there might be some call to delay in it, to feed the dog.

Also I must take a look at the file upload. That's horribly slow and uploading a file > 3 MB is not possible. The connection is reset before it may finish.

But first I must take some sleep. It's already past midnight here.

Do you have plans to make this somewhat selectable using a define, to make the switch from SPIFFS to LittleFS more dynamic? I like it and I think it should be the default for any filesystem > 1 MB.

And on a side note. If you plan on buying 16MB nodes, do not trust the power regulators on them. The last 3 batches I received all have 150 mA regulators, which has occupied me all day today. :( Well, better knowing it now than having to replace lots of deployed nodes later.

TD-er commented 5 years ago

Hmm, indeed. See https://github.com/letscontrolit/ESPEasy/blob/mega/src/ESPEasyStorage.ino There are 5 lines with SPIFFS_CHECK(f.write in them and all are indeed writing 1 byte at a time. There may have been a good reason for it in the past, but I guess for LittleFS it is killing.

TD-er commented 5 years ago

I've had it running over night and got some statistics for the cache controller (appending to files)

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
C_16_Cache Controller [Experimental] CPLUGIN_PROTOCOL_SEND 124834 4.00 0.459 92.370 8898.551
TryOpenFile()   124965 4.00 4.883 30.956 2021.852

Those extremes are quite large and also the average seems quite high, which may indicate the extremes do occur more than once.

Last night I did fill up the flash completely and deleted the dummy files again. So it may need to do some garbage collection which may stall the process.

This test node was only running the sysinfo plugin (4x each second) and writing it to the cache controller, so there's no network activity involved. It has written 3632 kB in bursts of 240 bytes.

It seems like the file handling becomes slower when there are more files present.

TD-er commented 5 years ago

I made some changes to write in blocks instead of per byte, but the time needed to save the settings is the same.

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
Save File   10 0.06 1549.456 1722.554 1795.420

The updates are in a bigger file.

159908: FILE : Saved 1228 bytes to config.dat

So the save function does write it all in 1 burst.

earlephilhower commented 5 years ago

@TD-er Would it be fair to say that if I wanted to simulate this without ESPEasy I could do a sequence like:

char buff[1228];
auto start=millis();
auto f = LittleFS.open("/data.bin", "w");
f.write(buff, 1228);
f.close();
auto stop = millis();
Serial.printf("time to write = %d\n", stop - start);

If that's the case, then it's easy to make a little test case we can try w/the LittleFS object and plain LittleFS on the host to see WTH it's doing.

TD-er commented 5 years ago

No, it does also do a seek into the file itself. The Struct is part of a bigger settings file which is a concatenation of several structs. So in pseudo code it is something like this:

N.B. in ESPeasy the current code does write per byte and checks the result, which is rather silly. I see no reason why not writing in larger chunks like 256 bytes. As I browse through this issue, it looks like I also tried writing the entire struct at once. Seeing the stack-allocated array of 1228 bytes makes my fingers feel itchy ;)

earlephilhower commented 5 years ago

Writing 1,228 bytes, one at a time (i.e. 1228 f.write(char)) is gonna hurt no matter what and, depending on the data durability guarantees, result in up to 1,228 flash sector erases, writes, and verifies. The LittleFS program write buffer is 64 bytes (IIRC) so it might only be a more reasonable ceil(1228/64) = 20, but it's still about 19x more than would be preferable and there's still the overhead of 1K calls through a couple layers of indirection. I don't think you can optimize that much.

However, you said you'd tried larger blocks but didn't see a difference:

I made some changes to write in blocks instead of per byte, but the time needed to save the settings is the same.

The code as I now understand it would be

  #define BUFFSIZE 16 // Or whatever chunk size you tried
  char buff[BUFFISZE];
  auto f=LittleFS.open("settings.bin", "w");
  for (auto i=0; i<65536; i+=sizeof(buff)) f.write(buff, sizeof(buff));
  f.close()
  auto start=millis();
  auto f = LittleFS.open("/data.bin", "r+"); // Don't truncate on create
  f.seek(8000, SEEK_SET); // Misaligned start
  for (auto i=0; i<1228; i+=sizeof(buff)) f.write(buff, sizeof(buff));
  f.close();
  auto stop = millis();
  Serial.printf("time to write = %d\n", stop - start);
earlephilhower commented 5 years ago

I was able to reproduce this with the sketch below, and it is actually a LittleFS design choice.

https://github.com/ARMmbed/littlefs/issues/244

Basically in-the-middle updates require re-writing the blocks after the modified block. On a 64K file with 8K offset, that's 56K x # of blocks (6 in this case I think).

You can make it a little faster by handling the copying yourself (see code below, probably has bugs but it's just for demonstration). (From 1.4s to 1.0s)

Or, you can use separate files (or a suibdir since they work fine on LittleFS) for each structure and then just overwrite the file-per-structure and you'll have blazing speeds.

#include <LittleFS.h>
void setup() {
  Serial.begin(115200);
  LittleFS.begin();
}
void loop() { 
  #define BUFFSIZE 128 // Or whatever chunk size you tried
  char buff[1228];
  auto f=LittleFS.open("settings.bin", "w");
  for (auto i=0; i<65536; i+=sizeof(buff)) f.write(buff, sizeof(buff));
  f.close();
  auto start=millis();
  f = LittleFS.open("settings.bin", "r+"); // Don't truncate on create
  f.seek(8000, SeekSet); // Misaligned start
  for (auto i=0; i<1228; i+=sizeof(buff)) f.write(buff, sizeof(buff));
  f.close();
  auto stop = millis();
  Serial.printf("time to modify-write = %ld\n", stop - start);

  start=millis();
  f = LittleFS.open("settings.bin", "r");
  auto g = LittleFS.open("settings.new", "w");
  int j;
  for (j=0; (j+sizeof(buff))<8000; j+=sizeof(buff)) {
    f.read((uint8_t*)buff, sizeof(buff));
    g.write((uint8_t*)buff, sizeof(buff));
  }
  f.read((uint8_t*)buff, 8000-j);
  g.write((uint8_t*)buff, 8000-j);
  for (auto i=0; i<1228; i+=sizeof(buff)) g.write(buff, sizeof(buff));
  do {
    j = f.read((uint8_t*)buff, sizeof(buff));
    if (j) g.write((uint8_t*)buff, j);
  } while (j);
  f.close();
  g.close();
  stop = millis();
  Serial.printf("time to new-write = %ld\n", stop - start);

  delay(10000);
 }
TD-er commented 5 years ago

I think I can make it a separate handler on LittleFS and write a separate import/export handler for it to make it look like a single file. This still makes it compatible to share settings among versions using SPIFFS or LittleFS.

What is the default block size used in LittleFS? A page is probably 256 bytes, since the flash uses that size, but I guess a block to erase at once is 8k?

Can LittleFS handle concatenation to a file well?

earlephilhower commented 5 years ago

Append is fine and very fast. It's just updates in the middle of a file that hurt. The LittleFS speed test, for example, is 100s of KB/second writing to the initial test file.

Block sizes are the same as SPIFFS, it uses the same options passed in from the IDE.

earlephilhower commented 4 years ago

Closing as a won't-fix due to SPIFFS being deprecated. LittleFS does have quirks, but is actively supported and many times faster (except for the update-in-middle scenario here).