PaulStoffregen / LittleFS

73 stars 20 forks source link

Please update - This repo uses an outdated version #33

Closed FrankBoesing closed 2 years ago

FrankBoesing commented 2 years ago

This uses https://github.com/PaulStoffregen/LittleFS/blob/da02d2c4b7ba111099773d568851d05d387c9ef8/src/littlefs/lfs.h#L24

And is from march 2020

The original one https://github.com/littlefs-project/littlefs/blob/ead50807f1ca3fdf2da00b77a0ce02651ded2d13/lfs.h#L25
Is already 9 months old, too.

Has bugfixes and additional features.

https://github.com/littlefs-project/littlefs/blob/master/lfs.h

mjs513 commented 2 years ago

@PaulStoffregen - been testing version 2.4.1 at the same time we are testing MTP/USBHost changes. Not seeing any issues so far with updating to latest version but want to do dedicated testing. Will post back.

PaulStoffregen commented 2 years ago

Does the new version open files faster, or bring any other benefits worth risking disruption to existing projects?

mjs513 commented 2 years ago

I was running it for a while but we ran into issues with MTP development but wasn't sure if it was a problem with MTP or LittleFS so put it back to the TD version. A quick test that I did was running the file open timing test but didn't improve NAND file open times but again that is not conclusive testing.

Other than making it thread-safe I would recommend not updating until we finish with the MTP/USBHost changes just to remove one last variable. But would recommend updating at some point either during the final beta or at some other point.

I can test it on and off during our MTP/USBHost development as we make updates as a double check and maybe make some timing tests in between other testing.

Hope this makes sense

mjs513 commented 2 years ago

@PaulStoffregen Planning on using a modified version of SDFat's benchmark sketch to test read write speeds. Output for a W25Q128 on QSPI looks something like this:

Bytes Used: 8192, Bytes Total:16777216
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
113.51,37663,982,4510
112.20,39414,983,4563

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
24411.83,109,14,20
24411.49,108,14,20

Done
Time to Open File for Read (uSecs): 198
Time to Close File(uSecs): 1

Figure its a good way for speed comparisons. Played with early on but don't remember if we ever really ever collected it in one spot before. Here is the sketch - if you have any suggestions let me know

/*
 * This program is a simple binary write/read benchmark.
 */
#include <SD.h>
#include "sdios.h"

#include <LittleFS.h>
LittleFS_QSPIFlash myfs;

File file;

// Set SKIP_FIRST_LATENCY true if the first read/write to the SD can
// be avoid by writing a file header or reading the first record.
const bool SKIP_FIRST_LATENCY = true;

// Size of read/write.
const size_t BUF_SIZE = 512;

// File size in MB where MB = 1,000,000 bytes.
const uint32_t FILE_SIZE_MB = 5;

// Write pass count.
const uint8_t WRITE_COUNT = 2;

// Read pass count.
const uint8_t READ_COUNT = 2;
//==============================================================================
// End of configuration constants.
//------------------------------------------------------------------------------
// File size in bytes.
const uint32_t FILE_SIZE = 1024*1024*FILE_SIZE_MB;

// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3) / 4];
uint8_t* bufA32 = (uint8_t*)buf32;

// Serial output stream
ArduinoOutStream cout(Serial);

//------------------------------------------------------------------------------
// Store error strings in flash to save RAM.
//----------------------------------------------------------------------------
void errorHalt(const __FlashStringHelper* msg) {
  Serial.print(F("error: "));
  Serial.println(msg);
}

#define error(s) errorHalt(F(s))
//------------------------------------------------------------------------------

//------------------------------------------------------------------------------
void clearSerialInput() {
  uint32_t m = micros();
  do {
    if (Serial.read() >= 0) {
      m = micros();
    }
  } while (micros() - m < 10000);
}
//------------------------------------------------------------------------------
void setup() {
  Serial.begin(9600);

  // Wait for USB Serial
  while (!Serial) {
    SysCall::yield();
  }
  delay(1000);
  if (CrashReport) {
    Serial.print(CrashReport);
  }
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);

///////////////////////////////////////////////

  myfs.begin();

///////////////////////////////////////////////
  cout << F("\nUsing a freshly formatted SD for best performance.\n");
  myfs.quickFormat();

  // use uppercase in hex and use 0X base prefix
  cout << uppercase << showbase << endl;

  // Discard any input.
  clearSerialInput();

  // F() stores strings in flash to save RAM
  cout << F("Type any character to start\n");
  while (!Serial.available()) {
    SysCall::yield();
  }

  speedBench();
}

void loop() {}

//------------------------------------------------------------------------------
void speedBench() {
  File file;
  float s;
  uint32_t t;
  uint32_t maxLatency;
  uint32_t minLatency;
  uint32_t totalLatency;
  bool skipLatency;

  // fill buf with known data
  if (BUF_SIZE > 1) {
    for (size_t i = 0; i < (BUF_SIZE - 2); i++) {
      bufA32[i] = 'A' + (i % 26);
    }
    bufA32[BUF_SIZE - 2] = '\r';
  }
  bufA32[BUF_SIZE - 1] = '\n';

  Serial.printf("Bytes Used: %llu, Bytes Total:%llu\n", myfs.usedSize(), myfs.totalSize());
  cout << F("FILE_SIZE = ") << FILE_SIZE << endl;
  cout << F("BUF_SIZE = ") << BUF_SIZE << F(" bytes\n");
  cout << F("Starting write test, please wait.") << endl << endl;

  // do write test
  uint32_t n = FILE_SIZE / BUF_SIZE;
  cout << F("write speed and latency") << endl;
  cout << F("speed,max,min,avg") << endl;
  cout << F("KB/Sec,usec,usec,usec") << endl;

  // open or create file - truncate existing file.
  file = myfs.open("bench.dat", FILE_WRITE);

  for (uint8_t nTest = 0; nTest < WRITE_COUNT; nTest++) {
    file.seek(0);

    maxLatency = 0;
    minLatency = 9999999;
    totalLatency = 0;
    skipLatency = SKIP_FIRST_LATENCY;
    t = millis();

    for (uint32_t i = 0; i < n; i++) {
      uint32_t m = micros();
      if (file.write(bufA32, BUF_SIZE) != BUF_SIZE) {
        error("write failed");
      }
      m = micros() - m;
      totalLatency += m;
      if (skipLatency) {
        // Wait until first write to SD, not just a copy to the cache.
        skipLatency = file.position() < 512;
      } else {
        if (maxLatency < m) {
          maxLatency = m;
        }
        if (minLatency > m) {
          minLatency = m;
        }
      }
    }

    t = millis() - t;
    s = file.size();
    cout << s / t << ',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency / n << endl;
  }
  cout << endl << F("Starting sequential read test, please wait.") << endl;
  cout << endl << F("read speed and latency") << endl;
  cout << F("speed,max,min,avg") << endl;
  cout << F("KB/Sec,usec,usec,usec") << endl;

  // do read test
  for (uint8_t nTest = 0; nTest < READ_COUNT; nTest++) {
    file.seek(0);
    maxLatency = 0;
    minLatency = 9999999;
    totalLatency = 0;
    skipLatency = SKIP_FIRST_LATENCY;
    t = micros();
    for (uint32_t i = 0; i < n; i++) {
      bufA32[BUF_SIZE - 1] = 0;
      uint32_t m = micros();
      int32_t nr = file.read(bufA32, BUF_SIZE);
      if (nr != BUF_SIZE) {
        error("read failed");
      }
      m = micros() - m;
      totalLatency += m;
      if (bufA32[BUF_SIZE - 1] != '\n') {
        error("data check error");
      }
      if (skipLatency) {
        skipLatency = false;
      } else {
        if (maxLatency < m) {
          maxLatency = m;
        }
        if (minLatency > m) {
          minLatency = m;
        }
      }
    }

    s = file.size();

    t = micros() - t;
    cout << s * 1000 / t << ',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency / n << endl;
  }

  cout << endl << F("Done") << endl;
  file.close();

  t = micros();
  file = myfs.open("bench.dat", FILE_READ);
  t = micros() - t;
  cout << F("Time to Open File for Read (uSecs): ") << t << endl;
  t = micros();
  file.close();
  t = micros() - t;
  cout << F("Time to Close File(uSecs): ") << t << endl;
}
FrankBoesing commented 2 years ago

Paul, you can have a look at the history and it's many merges:

https://github.com/littlefs-project/littlefs/commits/master/lfs.c

https://github.com/littlefs-project/littlefs/commits/master/lfs.h

I don't care that much about the speed for open() ( I even don't know if LittleFS is the problem, or the wrapper).
There are some merges re: Thread-safety.

"any other benefits worth risking disruption to existing projects?"

Why are you suspecting ARM ("Copyright (c) 2017, Arm Limited. All rights reserved.") to make thier code (compared to the random version you use) worse?

I know this question goes to my direction. I'm only slightly interested in all this - I don't really need it. But maybe your customers do.

PaulStoffregen commented 2 years ago

I'm trying to make some sense out of multiple github issues and forum threads which started near the end of last week and have a lot of conversation over the last few days.

So far, I'm seeing this:

1: LittleFS NAND has a performance problem, much slower than NOR

2: LittleFS (and maybe SdFat) crash when used from interrupts

3: FS.h needs more APIs for specific use cases, particularly creating large files in a way which optimizes performance (eg, pre-allocating contiguous space on exfat)

4: FS.h is an abstraction layer which doesn't allow higher level code to detect the underlying filesystem (and topics like RTTI)

5: LittleFS is using a code ARM published last year

Maybe I missed some other stuff?

mjs513 commented 2 years ago

@PaulStoffregen - @KurtE Haven't followed those threads that closely except for updating LittleFS to the newest version (item 5) and Item 2 NAND vs Nor speed. Just finished using the benchmark sketch using current version of LittleFS I posted above - these are the raw results for QSPI and SPI. Unfortunately the only NAND I have that I can test on QSPI is the M02 at the moment - i will put the spi results next - sorry not pretty but since you asked. I am going to switch to using Latest version of LittleFS both for benchmarking and testing with MTP and standalone if you would like:

QSPI Benchmark testing
--------------------------------------------------
512jveq
Bytes Used: 8192, Bytes Total:67108864
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
100.22,52267,1209,5108
385.31,9097,1233,1328

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
24124.61,110,14,21
24124.61,109,14,21

Done
Time to Open File for Read (uSecs): 200
Time to Close File(uSecs): 1

=======================================================
128jvsm
Type any character to start
Bytes Used: 8192, Bytes Total:16777216
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
521.78,1534,959,981
498.33,32041,960,1027

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
24411.83,108,14,20
24411.49,108,14,20

Done
Time to Open File for Read (uSecs): 199
Time to Close File(uSecs): 1

=======================================================
128jvsq
Bytes Used: 8192, Bytes Total:16777216
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
114.69,37438,983,4464
108.81,37805,984,4705

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
24411.60,108,14,20
24411.60,109,14,20

Done
Time to Open File for Read (uSecs): 198
Time to Close File(uSecs): 1

===========================================
64jvsio
5, 4 mb write fails
Bytes Used: 8192, Bytes Total:8388608
FILE_SIZE = 2097152
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
84.57,52940,971,6054
84.60,59058,971,6052

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
26752.12,86,14,19
26751.78,87,14,19

Done
Time to Open File for Read (uSecs): 198
Time to Close File(uSecs): 1

=====================================================
M02 NAND Flash
Bytes Used: 262144, Bytes Total:265289728
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1520.12,8960,131,336
1520.56,8560,131,336
1518.79,8561,131,337
1517.92,8561,131,337
1519.23,8561,131,336

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
19222.29,693,0,26
19222.36,693,0,26
19222.29,693,0,26
19222.43,693,0,26
19222.29,694,0,26

Done
Time to Open File for Read (uSecs): 2224
Time to Close File(uSecs): 1
mjs513 commented 2 years ago
SPI BENCH

64jvsio
Bytes Used: 8192, Bytes Total:8388608
FILE_SIZE = 2097152
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
80.98,51748,1255,6322
79.47,56054,1256,6442

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2580.73,925,154,198
2580.73,926,154,198

Done
Time to Open File for Read (uSecs): 1736
Time to Close File(uSecs): 1

=================================================
128jvsq
Bytes Used: 8192, Bytes Total:16777216
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
103.75,42357,1268,4935
108.11,108399,1270,4735

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2341.36,1157,154,218
2341.36,1156,154,218

Done
Time to Open File for Read (uSecs): 1736
Time to Close File(uSecs): 1

=================================================
128jvsm
Bytes Used: 8192, Bytes Total:16777216
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
103.87,43303,1246,4928
110.27,106599,1248,4642

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2341.36,1156,154,218
2341.36,1156,154,218

Done
Time to Open File for Read (uSecs): 1735
Time to Close File(uSecs): 0

==============================================
512jveq
Bytes Used: 8192, Bytes Total:67108864
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
76.57,54929,1542,6686
77.96,117182,1556,6567

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2332.40,1161,155,219
2332.40,1161,155,219

Done
Time to Open File for Read (uSecs): 1742
Time to Close File(uSecs): 0

================================================
N01 NAND FLash
Bytes Used: 262144, Bytes Total:131596288
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1058.95,7597,129,483
1060.24,5137,129,482
1059.17,5139,129,483
1057.46,5139,129,484
1058.53,5137,129,483

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4398.90,3088,0,116
4398.90,3088,0,116
4398.91,3088,0,116
4398.89,3088,0,116
4398.90,3088,0,116

Done
Time to Open File for Read (uSecs): 9749
Time to Close File(uSecs): 1

==============================================
N02 NAND FLASH
Bytes Used: 262144, Bytes Total:265289728
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1099.14,8620,129,465
1102.14,6202,129,464
1094.09,30831,129,467
904.57,31683,129,565
904.41,31683,129,566

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4558.95,2979,0,112
4558.93,2979,0,112
4558.94,2979,0,112
4558.93,2979,0,112
4558.94,2980,0,112

Done
Time to Open File for Read (uSecs): 9406
Time to Close File(uSecs): 1

===============================================
M02 NAND FLASH
Bytes Used: 262144, Bytes Total:265289728
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1013.31,35257,128,505
859.35,32820,128,595

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4404.76,3084,0,116
4404.73,3085,0,116

Done
Time to Open File for Read (uSecs): 9736
Time to Close File(uSecs): 1
KurtE commented 2 years ago

Sorry I am late to this game. Not sure if my comments here will help or not, but...

1: LittleFS NAND has a performance problem, much slower than NOR

With MTP I have seen several issues with sending an object from the PC to teensy when storing files, that some writes take a very long time and host get tired. I know at one point defragster put code in to run through free space and do the erase operation on ones that were dirty which sped things up.

3: FS.h needs more APIs for specific use cases, particularly creating large files in a way which optimizes performance (eg, pre-allocating contiguous space on exfat)

Maybe, but in some cases you might be able to do this without FS, by doing a seek operation to the preallocating size and then seek back to the beginning. Also unclear (at least to me), what this means for other file systems, like littlefs... Do you claim all of those clusters when asked to, does that then dirty them all up so then have to do erases on each write?
So do you just ignore it? But if the purpose was to preallocate the space such that you will know for sure that total size will be available for the duration of the time your sketch has that file open and you ignore it then ???

4: FS.h is an abstraction layer which doesn't allow higher level code to detect the underlying filesystem (and topics like RTTI)

Yes... Again not sure about RTTI stuff. Have not played with that for maybe 20 plus years. But ran into similar issues with MTP where for example we wanted to format drive and did not know what type of drive it was... Which is why we added (and now removed) the ability to register additional information when we added FS toMTP list. We sidestepped it with format, by adding that to FS. We have partially sidestepped knowing if SD card has been inserted (or removed).

We can probably add a few more things to FS that can help. Like is there preferred sizes for writes (and reads), like sector size, cluster size?

In discussion we talked about maybe adding some filesystem type member, where maybe it could tell you something like, it was SD or LittleFS or ... Not sure if would also break down to : fat type...

Maybe I missed some other stuff?

Probably me as well

FrankBoesing commented 2 years ago

1: yes.

2: LittleFS (and maybe SdFat) crash when used from interrupts
I'm not aware of problems with SdFat.

3: FS.h needs more APIs for specific use cases, particularly creating files with pre-allocated space
Together with a way to know which interface it uses. SPI? (->which SPI?)  SDIO? QSPI? Other?

4: FS.h is an abstraction layer which doesn't allow higher level code to detect the underlying filesystem (and topics like RTTI)
Together with 3), yes. 
In the meantime i figured out that RTTI adds ~8kB code. If there is way to move it to Flash, RTTI is may be a nice solution.

5: LittleFS is using a code ARM published last year
Many problems would be solved  (or less urgend) if it would just be thread-safe. I hope that is is now.

If not:  FS would have to handle the thread case. Something like SPIusingInterrupt for the other interfaces would be needed.
Then the user code can handle it, too- if it knows the interface (libraries that just get a "File")

PaulStoffregen commented 2 years ago

I'm looking at the NAND slowness now.

We do need to address the interrupt usage case. At this time I do not know which way is best.

Design of APIs like FS is a difficult matter with difficult to predict long-term effects on the software ecosystem. We do need more functionality, but how the API should look is a very difficult decision.

I do not believe adding ways to "cheat" the abstraction is a wise idea. It may give a short term solution, but poorly considered APIs are almost impossible to fix later without very painful breakage of backwards compatibility.

RTTI is not happening on Teensy.

I do not see using an older version of littlefs as a problem, unless it has a bug or limitation. If anyone wants to suggest updating, please know the answer will always be to stay with the old version unless there is clear and compelling reason to update. I do not care if we use a version that is 10 years old if the code is stable and effective.

mjs513 commented 2 years ago

Just as a FYI I switched to using LittleFS 2.4.1 (latest) and am now having issues with getting NOR Flash to start properly - seems to be hit or miss so going to revert back to current version.

As for the NAND slowness can be caused by supporting multiple 2G chips the M02 works slightly different than the N02. Also, code has remained in for BBLut and ECC even though not used that may cause a little bit of slow down.

FrankBoesing commented 2 years ago

Again, it has merges re: thread safety.

PaulStoffregen commented 2 years ago

Looks like I have only W25N01GV (25 here, Digikey has plenty) and W25N02KV chips (only 3 left, none at Digikey).

mjs513 commented 2 years ago

I have 3 W25N01GV and 3 W25M02GW but no N02's to put on breakout board. Looks like Mouser has W25N02KVSFIR in stock (https://www.mouser.com/ProductDetail/Winbond/W25N02KVSFIR?qs=W%2FMpXkg%252BdQ4KUHHgPqoidQ%3D%3D)

FrankBoesing commented 2 years ago

Before you put much energy in investigating and measuring: The new version has improvements for seek, write and perhaps more.  (And that improves other media, too) I'd just take a look at the history.

mjs513 commented 2 years ago

Ok - SPI flash not working was frustrating me with 2.4.1 so I reinstalled and rechecked wiring and check all my SPI Flash chips - they all seem to be working now. Think I have a loose connection someplace on my test board. Will redo the tests later to see if there is any difference in the timings.

UPDATE: Reran the benchmark sketch on the 2.41 for the M02 nand on QSPI and not seeing any improvement - timings are exactly the same. Ran the 128jvsq and again no change to the timings.

KurtE commented 2 years ago

Design of APIs like FS is a difficult matter with difficult to predict long-term effects on the software ecosystem. We do need more functionality, but how the API should look is a very difficult decision.

I do not believe adding ways to "cheat" the abstraction is a wise idea. It may give a short term solution, but poorly considered APIs are almost impossible to fix later without very painful breakage of backwards compatibility.

I totally agree that finding the right balance can be tricky:

If the abstraction is not sufficient to get the job done you are trying to do, you often end up hacking it. Like Serial Half duplex support. Up until (and probably still) libraries were passed in a Stream or HardwareSerial object and many of them had code in it like: if (str == (Stream*)Serial1) ...

But you don't want to add 100 new methods that cover every possibility... But some things that come to mind include; Note: not suggesting we need all of these, but simply ideas.

a) per-allocate size, probably should define the usage case. Make it faster in some cases? Or to make sure that file will have that space available to it regardless of if other files created or extended while that file is open

b) Hints on what size reads/writes are optimal for that FS Could be multiple settings like cluster/sector size

c) From a File object can I know which FS it belongs to? c1) From File or FS, can we deduce which type of FS it is? again mentioned like SD, vs LittleFS, plus should we be able to know if it is on QSPI vs SPI vs USBHost? within each of these do we have additional information (Fat16, Fat32, exFat), LittleFS (flash, NAND, ...)

d) From file can we get a get the full path name to the file?

e) Notifications - Can we register something (call back function, interface, event responder) with FS and/or File saying do this, when some events happen...

Example SD drive - who handles what to do when an SD card is inserted or released. Who lets MTP know if the sketch created or deleted a file within the storage area ...

Is this the level of things you want somewhere up here on github issues or in forum posts?

PaulStoffregen commented 2 years ago

Well, I've been looking through a lot of unfamiliar code for the NAND flash chips.

The code seems to be reading as fast as the chip can. About 50us is waiting for the row fetch time, which the datasheet says is 60us max when ECC is turned on. Then with QSPI, data transfer of 2048 bytes is taking about 100us, with is about 20 Mbyte/sec speed.

But file open fetches the same block+offset several times. Just doubling the cache size cuts the time from 2117us to 563us.

  config.cache_size = info->progsize * 2;

But I'm not sure if this is really correct. Right now I'm just running the benchmark, which doesn't look at whether we're actually reading correct data. Sometimes littlefs is requesting 4096 bytes. I'm not sure if the read() function really works for more than 2048 bytes.

mjs513 commented 2 years ago

Good morning all. Anyway - just as full disclosure yeah think I have to take responsibility for the NAND code in LittleFS as well as a couple of others.

As for cache size yes its possible since pretty much from what I was reading on nands and littlefs people have used it both ways. For reference you might want to check this: https://github.com/littlefs-project/littlefs/issues/214 and https://github.com/littlefs-project/littlefs/issues/11.

"Sometimes littlefs is requesting 4096 bytes" - strange on this one. The way the read is setup is that it requests a block of 2048 bytes to fill the databuffer first then it read from the data you requested from the databuffer. Maybe that's the 4096bytes?

I will work on adding a validate data function this morning to the benchmark sketch.

mjs513 commented 2 years ago

Updated sketch with read verify after write. You may want to check that I did it - check the match function.

/*
 * This program is a simple binary write/read benchmark.
 */
#include <SD.h>
#include "sdios.h"

#include <LittleFS.h>
LittleFS_QPINAND myfs;
const int chipSelect = 6;

File file;

// Set SKIP_FIRST_LATENCY true if the first read/write to the SD can
// be avoid by writing a file header or reading the first record.
const bool SKIP_FIRST_LATENCY = true;

// Size of read/write.
const size_t BUF_SIZE = 512;

// File size in MB where MB = 1,000,000 bytes.
const uint32_t FILE_SIZE_MB = 5;

// Write pass count.
const uint8_t WRITE_COUNT = 2;

// Read pass count.
const uint8_t READ_COUNT = 2;
//==============================================================================
// End of configuration constants.
//------------------------------------------------------------------------------
// File size in bytes.
const uint32_t FILE_SIZE = 1024*1024*FILE_SIZE_MB;

// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3) / 4];
uint8_t* bufA32 = (uint8_t*)buf32;
uint8_t* bufB32 = (uint8_t*)buf32;

//validate data counts
bool testMatch;
uint32_t badCount = 0;

// Serial output stream
ArduinoOutStream cout(Serial);

//------------------------------------------------------------------------------
// Store error strings in flash to save RAM.
//----------------------------------------------------------------------------
void errorHalt(const __FlashStringHelper* msg) {
  Serial.print(F("error: "));
  Serial.println(msg);
}

#define error(s) errorHalt(F(s))
//------------------------------------------------------------------------------

//------------------------------------------------------------------------------
void clearSerialInput() {
  uint32_t m = micros();
  do {
    if (Serial.read() >= 0) {
      m = micros();
    }
  } while (micros() - m < 10000);
}
//------------------------------------------------------------------------------
void setup() {
  Serial.begin(9600);

  // Wait for USB Serial
  while (!Serial) {
    SysCall::yield();
  }
  delay(1000);
  if (CrashReport) {
    Serial.print(CrashReport);
  }
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);

///////////////////////////////////////////////
  pinMode(chipSelect, OUTPUT);
  digitalWriteFast(chipSelect, HIGH);
  myfs.begin();

///////////////////////////////////////////////
  cout << F("\nUsing a freshly formatted SD for best performance.\n");
  myfs.quickFormat();

  // use uppercase in hex and use 0X base prefix
  cout << uppercase << showbase << endl;

  // Discard any input.
  clearSerialInput();

  // F() stores strings in flash to save RAM
  cout << F("Type any character to start\n");
  while (!Serial.available()) {
    SysCall::yield();
  }

  speedBench();
}

void loop() {}

//------------------------------------------------------------------------------
void speedBench() {
  File file;
  float s;
  uint32_t t;
  uint32_t maxLatency;
  uint32_t minLatency;
  uint32_t totalLatency;
  bool skipLatency;

  // fill buf with known data
  if (BUF_SIZE > 1) {
    for (size_t i = 0; i < (BUF_SIZE - 2); i++) {
      bufA32[i] = 'A' + (i % 26);
      bufB32[i] = 'A' + (i % 26);
    }
    bufA32[BUF_SIZE - 2] = '\r';
    bufB32[BUF_SIZE - 2] = '\r';
  }
  bufA32[BUF_SIZE - 1] = '\n';
  bufB32[BUF_SIZE - 1] = '\n';

  Serial.printf("Bytes Used: %llu, Bytes Total:%llu\n", myfs.usedSize(), myfs.totalSize());
  cout << F("FILE_SIZE = ") << FILE_SIZE << endl;
  cout << F("BUF_SIZE = ") << BUF_SIZE << F(" bytes\n");
  cout << F("Starting write test, please wait.") << endl << endl;

  // do write test
  uint32_t n = FILE_SIZE / BUF_SIZE;
  cout << F("write speed and latency") << endl;
  cout << F("speed,max,min,avg") << endl;
  cout << F("KB/Sec,usec,usec,usec") << endl;

  // open or create file - truncate existing file.
  file = myfs.open("bench.dat", FILE_WRITE);

  for (uint8_t nTest = 0; nTest < WRITE_COUNT; nTest++) {
    file.seek(0);
    maxLatency = 0;
    minLatency = 9999999;
    totalLatency = 0;
    skipLatency = SKIP_FIRST_LATENCY;
    t = millis();

    for (uint32_t i = 0; i < n; i++) {
      uint32_t m = micros();
      if (file.write(bufA32, BUF_SIZE) != BUF_SIZE) {
        error("write failed");
      }
      m = micros() - m;
      totalLatency += m;
      if (skipLatency) {
        // Wait until first write to SD, not just a copy to the cache.
        skipLatency = file.position() < 512;
      } else {
        if (maxLatency < m) {
          maxLatency = m;
        }
        if (minLatency > m) {
          minLatency = m;
        }
      }
    }

    t = millis() - t;
    s = file.size();
    cout << s / t << ',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency / n << endl;

        //verify data
    testMatch = true;
    badCount = 0;
    for (uint32_t i = 0; i < n; i++) {
      file.seek(0);
      int32_t nr = file.read(bufB32, BUF_SIZE);
      if (nr != BUF_SIZE) {
        error("read failed");
      }
      testMatch = match(bufB32, bufA32);
      if(!testMatch) { badCount = badCount + 1; }
    }
    cout << F("Unmatch data in file: ") << badCount << endl << endl;
  }
  cout << endl << F("Starting sequential read test, please wait.") << endl;
  cout << endl << F("read speed and latency") << endl;
  cout << F("speed,max,min,avg") << endl;
  cout << F("KB/Sec,usec,usec,usec") << endl;

  // do read test
  for (uint8_t nTest = 0; nTest < READ_COUNT; nTest++) {
    file.seek(0);
    maxLatency = 0;
    minLatency = 9999999;
    totalLatency = 0;
    skipLatency = SKIP_FIRST_LATENCY;
    t = micros();
    for (uint32_t i = 0; i < n; i++) {
      bufA32[BUF_SIZE - 1] = 0;
      uint32_t m = micros();
      int32_t nr = file.read(bufA32, BUF_SIZE);
      if (nr != BUF_SIZE) {
        error("read failed");
      }
      m = micros() - m;
      totalLatency += m;
      if (bufA32[BUF_SIZE - 1] != '\n') {
        error("data check error");
      }
      if (skipLatency) {
        skipLatency = false;
      } else {
        if (maxLatency < m) {
          maxLatency = m;
        }
        if (minLatency > m) {
          minLatency = m;
        }
      }
    }

    s = file.size();

    t = micros() - t;
    cout << s * 1000 / t << ',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency / n << endl;    
  }

  cout << endl << F("Done") << endl;
  file.close();

  t = micros();
  file = myfs.open("bench.dat", FILE_READ);
  t = micros() - t;
  cout << F("Time to Open File for Read (uSecs): ") << t << endl;
  t = micros();
  file.close();
  t = micros() - t;
  cout << F("Time to Close File(uSecs): ") << t << endl;
}

//https://stackoverflow.com/questions/66467762/compare-2-char-arrays-c// return true if two C strings are equal
bool match( const uint8_t *ptr1,  const uint8_t* ptr2)
{
    // zero bytes at ends
    while (*ptr1 != 0 && *ptr2 != 0)
    { 
        if (*ptr1 != *ptr2) {  
            return false;
        }
        ++ptr1;
        ++ptr2;
    }
    return (*ptr1 == *ptr2);
}
KurtE commented 2 years ago

Looks like fun: Will be interesting to see about differences in timing of writes. That is I know with LittleFS on NAND with MTP receiving an object from PC to write to the flash... For example this morning sending two larger files to the NAND on pin 4 of memory board:

First large file:

MTPD::SendObject: len:12106599
 # USB Packets: 23645 total: 1 avg ms: 0 max: 1
 # Write: 5912 total:13709 avg ms: 2 max: 143
>>>Total Time: 13834567

Note the avg write took 2ms, but at least one took 143ms to complete.

But next larger file:

MTPD::SendObject: len:30994805
 # USB Packets: 60536 total: 7 avg ms: 0 max: 1
 # Write: 15135 total:34741 avg ms: 2 max: 33
>>>Total Time: 34979661

In this one the longest took 33ms and yes it took about 35 seconds to send this file... We are currently configured for 2k writes on T4.x... With 8K writes, On some media we probably have faster total times, but when we run into a really slow write, then host gives up on us...

Not sure what all of the differences in timing is. I know before it was when you had to erase the block before you could write it. But now sure if there is other ripple effects, like when it has to change the generation stuff (rusty on LittleFS terminology here)

But hopefully the benchmarking is testing both "Clean" volumes (LowLevelFormat) versus maybe dirty with just running erase.

Now back to playing

mjs513 commented 2 years ago

@KurtE Testing with quickFormat prior to running benchmark - easy enough to change. EDIT: With lowLevelFormat for the M02

Using a freshly formatted SD for best performance.
...............................................................................................................................

Type any character to start
Bytes Used: 262144, Bytes Total:265289728
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
566.61,6288830,134,903
Unmatch data in file: 0

1773.64,8368,134,288
Unmatch data in file: 0

Starting sequential read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
19223.49,694,0,26
19227.30,694,0,26

Done
Time to Open File for Read (uSecs): 2225
Time to Close File(uSecs): 1
mjs513 commented 2 years ago

Ok just soldered up a N01 to a breakout board and tested it on QSPI. I tried adjusting the cache size: config.cache_size = info->progsize * 2; but that keeps failing on me:

Type any character to start
Bytes Used: 262144, Bytes Total:131596288
FILE_SIZE = 5242880
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
FO:: set attribute creation failed
FO:: set attribute modified failed
error: write failed
error: write failed

those fails are on setting the timestamps for the file. Happens in current version and latest version.

Also just to let you see what happens if you get bad blocks: Status of the links

OpenEntries: 20
First Open Entry: 0
PBA: 480, LBA: 1005
Status of the links
OpenEntries: 20
First Open Entry: 0
PBA: 480, LBA: 1005

Usually if you do a format the errors disappear

FrankBoesing commented 2 years ago

I still don't get why it's not wanted to merge all the things that the new version has. !Several! bugfixes (also re: performance) etc.
And doesn't it make more sense to fix any issues using the newer version?

FrankBoesing commented 2 years ago

Also, a web search showed, that LittleFS - in general - works good with NAND. Maybe the slow open() is just a usage / config / wrapper problem.
Anywhow, more interesting would be to make it thread-safe. And my logic says, that performance-problems should be fixed after that.

mjs513 commented 2 years ago

@FrankBoesing - been testing with the new version for a bit now and only time I go back to this version is if I a problem comes. So far any issues I have had are in both versions. With that said I have not seen any performance improvements with the new version. If you want the version that works with Teensy to play with to see if you are realizing any performance improvement I posted it in a branch of my LittleFS fork: https://github.com/mjs513/LittleFS/tree/newVersionLittleFS

So far in running the benchmarking sketch I have not seen any real performance improvements especially on file open.

FrankBoesing commented 2 years ago

Well, there are patches for performance. I.e. for write. Not sure why I should test that.

There is no reason to use an older version just because it is "not worse" (He?)
On the contrary, if the new version is not worse, it should be used. 

Ok, I give up. 
 

JUST  FIX IT

FrankBoesing commented 2 years ago

Makes no sense to spend more time on discussion.

KurtE commented 2 years ago

Hi Frank,

Maybe it is just me, but your closing of this issue is confusing.

After you opened up this issue, there has been active work on doing exactly what you requested. That is one or more people have been actively working on updating to the latest stuff. As has been shown by Mike. Also Paul has also spent time trying to address some the performance issues.

I do agree with you, that this early in a beta cycle, it might not be a bad idea to upgrade to the later stuff, as to give us time to resolve any new issues it might introduce. But I also understand Paul’s reluctance to fully embrace this. That is, is there any improvements in the later stuff that address any issues we have run into with the current stuff. Does it fix the performance issues or some other bugs we have hit? So far from what Mike has posted, I have not seen any major performance improvements.

So to me, a balanced approach to this would be for someone (in this case Mike) to bring in the new version and make the changes need to work with Teensyduino, and then put up on some fork/branch on Github. Which he has now done.

Then to have people interested in these changes to download it and test out all of their stuff, and hopefully document which issues have been resolved. Such things as performance, works better on interrupts, less corruptions… And if it does make it thread safe it would be interesting to see if works with teensythreads.

And also would be good to know if does break anything as well. such that we can fix it before others hit it in a Teenyduino release.

Other datapoints that might be helpful, is if for example you are now using it on other platforms such as ESP32, are they using the current release? If so did that resolve some issues. Did it improve their performance?

And if there is enough positive feedback to this, it makes a much stronger case, to pull it in for the next beta.

Again sorry if this is just me being confused!