earlephilhower / arduino-pico

Raspberry Pi Pico Arduino core, for all RP2040 and RP2350 boards
GNU Lesser General Public License v2.1
2.03k stars 422 forks source link

Flash operations break PSRAM interface #2537

Open technoblogy opened 1 week ago

technoblogy commented 1 week ago

I have a program using LittleFS that works OK when not using PSRAM, but hangs when using PSRAM.

Here's a minimal demonstration:

#include <LittleFS.h>

uint32_t Workspace[10000] PSRAM;
const int len = 1000;

bool checkdata () {
  for (int i=0; i<len; i++) if (Workspace[i] != i) return false;
  return true;
}

void savedata () {
  for (int i=0; i<len; i++) Workspace[i] = i;
  if (checkdata()) Serial.println("Data setup OK");
  LittleFS.begin();
  File file;
  file = LittleFS.open("/DATA.DAT", "w");
  file.close();
}

void setup() {
  Serial.begin(9600);
}

void loop() {
  savedata();
  Serial.println("Saved");
  if (!checkdata()) Serial.println("Data error");
  delay(5000);
}

The program simply opens and closes a file on LittleFS.

Running the program displays Data error indicating that LittleFS has corrupted the array Workspace. If you remove the PSRAM attribute from Workspace the program runs without error.

technoblogy commented 1 week ago

Sorry, I forgot to give configuration details:

I'm running on the ARM core, but I think the behaviour is the same with the RISC-V core.

maxgerhardt commented 1 week ago

Do you have a debugger available in the form of another Raspberry Pi Pico (RP2040 based)? You could use it to flash debugprobe on it, connect it to the SWD port of the RP2350 board and then set a "hardware data write breakpoint" at the first address of the Workspace PSRAM array after it was initialized, so the debugger would halt at the first instruction that tries to corrupt the array.

technoblogy commented 1 week ago

I do have an RP2040 Raspberry Pi Pico, but I'm not familiar with the rest of the procedure you describe. Can you suggest a link to more details?

maxgerhardt commented 1 week ago

I've done a writeup at https://github.com/maxgerhardt/pio-psram-debug.

Sadly the PimoroniPicoPlus2 has its debugging pins (SWD) at a different, smaller-pitch connector (JST-SH). I'm not sure if you have a 3-pin JST-SH adapter (like https://www.aliexpress.us/item/2251832780304648.html) or whether you can solder to it. But without those pins, debugging is not possible.

You can check though whether the same problem reproduces in the PlatformIO build system with only the regular USB connection and doing the first 2 chapters of the README.

technoblogy commented 1 week ago

Gosh - thank you very much. That's very detailed.

It's all a bit new to me as I haven't used those tools before, but I'll give it a go! David

maxgerhardt commented 1 week ago

My previously ordered Pimoroni Pico Plus 2W (with WiFi chip) should arrive next week anyways, and it should have the same PSRAM as the Plus 2. So I should be able to reproduce and debug that bug, too; no worries if you can't get it it work.

technoblogy commented 1 week ago

OK, that's good to know!

earlephilhower commented 1 week ago

I haven't run this, but I do have boards w/PSRAM in them. But I think I know what's happening w/o running anything...

The QSPI interface is shared between flash and PSRAM. In general operation, it's owned by the cache hardware and untouched by the CPU itself. Reads "just" happen as far as the CPU is concerned, be they PSRAM or flash.

To erase or write a flash block, however, the CPU needs to take over the QSPI interface by writing some registers and manually force it to send specific commands to flash. When done, it should reset ownership to the HW by restoring those registers.

I have the feeling that when the QSPI registers are restored, they're only restored for flash operation. The PSRAM interface is then lost.

That would explain the LittleFS issue. It's not LittleFS, it's flash_range_erase and its brothers that's the issue. There was also the same symptom seen in a discussion when trying to read the Pico ID (which reads the flash chip's serial), so even more evidence.

technoblogy commented 1 week ago

Is there a workaround, or will it need an update to the core?

earlephilhower commented 1 week ago

Is there a workaround...

Probably not

...or will it need an update to the core?

Probably so

:laughing:

earlephilhower commented 1 week ago

A little bit of testing shows the ROM flash command operations change the configuration of the 2nd QMI memory space (aka PSRAM)...can't see why they would do so, but it effectively breaks comms to the PSRAM chip since the QMI will now not have the right timing nor commands.

A quick test that saves and restores the QMI state around flash operations seems to work, but this will need to be wrapped around all flash_xxx calls in core_wrap.

Simple working LittleFS example (warning, formats the onchip FS). Set FS size to anything, and if you comment out the QMI restore it will fail. Leave it in and it will work...

#include <hardware/address_mapped.h>
#include <hardware/clocks.h>
#include <hardware/gpio.h>
#include <hardware/regs/addressmap.h>
#include <hardware/spi.h>
#include <hardware/structs/qmi.h>
#include <hardware/structs/xip_ctrl.h>

#include <LittleFS.h>

void dump() {
  Serial.printf("qmi_hw->m[1].timing = %08lx\n", qmi_hw->m[1].timing);
  Serial.printf("qmi_hw->m[1].rfmt   = %08lx\n", qmi_hw->m[1].rfmt);
  Serial.printf("qmi_hw->m[1].rcmd   = %08lx\n", qmi_hw->m[1].rcmd);
  Serial.printf("qmi_hw->m[1].wfmt   = %08lx\n", qmi_hw->m[1].wfmt);
  Serial.printf("qmi_hw->m[1].wcmd   = %08lx\n", qmi_hw->m[1].wcmd);
  Serial.printf("xip_ctrl_hw->ctrl   = %08lx\n", xip_ctrl_hw->ctrl);
}

int data[10000] PSRAM;

void check() {
  bool fail = false;
  for (int i = 0; i < 10000 && !fail; i++) {
    if (data[i] != i) {
      fail = true;
    }
  }
  Serial.printf("%s\n", fail ? "FAIL" : "PASS");
}

void prep() {
  for (int i = 0; i < 10000; i++) {
    data[i] = i;
  }
}

void setup() {
  delay(5000);
  prep();
  auto s = qmi_hw->m[1];
  Serial.printf("Before flash write\n");
  dump();
  check();
  Serial.printf("FLASHWRITE\n");
  LittleFS.format();
  dump();
  Serial.printf("Restoring QMI\n");
  qmi_hw->m[1] = s;
  __compiler_memory_barrier();
  dump();
  check();
}

void loop() {
}
Before flash write
qmi_hw->m[1].timing = 61a48102
qmi_hw->m[1].rfmt   = 000612aa
qmi_hw->m[1].rcmd   = 000000eb
qmi_hw->m[1].wfmt   = 000012aa
qmi_hw->m[1].wcmd   = 00000038
xip_ctrl_hw->ctrl   = 00000883
PASS
FLASHWRITE
qmi_hw->m[1].timing = 60007203
qmi_hw->m[1].rfmt   = 000492a8
qmi_hw->m[1].rcmd   = 000000eb
qmi_hw->m[1].wfmt   = 000012aa
qmi_hw->m[1].wcmd   = 00000038
xip_ctrl_hw->ctrl   = 00000883
Restoring QMI
qmi_hw->m[1].timing = 61a48102
qmi_hw->m[1].rfmt   = 000612aa
qmi_hw->m[1].rcmd   = 000000eb
qmi_hw->m[1].wfmt   = 000012aa
qmi_hw->m[1].wcmd   = 00000038
xip_ctrl_hw->ctrl   = 00000883
PASS
maxgerhardt commented 1 week ago

This should definitely be logged to https://github.com/raspberrypi/pico-sdk/issues if this behavior is not specified in the RP2350 datasheet.

earlephilhower commented 1 week ago

Yup, I'm going to open something now that the CI is working on the fix PR. Tested #2539 with the original example and my own test (with the QMI restore commented out) so once I run basis RP2040 sanity we'll be good. Also manually tested flash_get_id() and that's also covered.

earlephilhower commented 1 week ago

@technoblogy, BTW, thanks for the detailed configuration info and MCVE test sketch! Your example works with the latest git master, so please git pull and give your whole app another go...

technoblogy commented 1 week ago

Great work!

I cloned arduino-pico to GitHub desktop on my computer, and then copied the files in place of the existing files in the 4.1.1 folder in the Arduino15 packages folder - I hope that was correct.

However I'm getting this error:

/Users/david/Library/Arduino15/packages/rp2040/hardware/rp2040/4.1.1/system/python3/python3 -I /Users/david/Library/Arduino15/packages/rp2040/hardware/rp2040/4.1.1/tools/uf2conv.py --serial /dev/cu.usbmodem14101 --family RP2040 --deploy /var/folders/x7/_nwwqxs17m5fylrm6vpnkxkw0000gn/T/arduino_build_690175/LispARM214.ino.uf2 
java.io.IOException: Cannot run program "/Users/david/Library/Arduino15/packages/rp2040/hardware/rp2040/4.1.1/system/python3/python3": error=2, No such file or directory

Thanks, David

earlephilhower commented 1 week ago

No, they can't just be copied over an IDE installation. There are differences that need to be configured when moving from git which you downloaded and the IDE's package.

You'll want to remove the installation and reinstall to get something that will work, and from there you could copy over cores/rp2040/flash_wrapper.cpp (the new file) and overwrite the updated lib/core_wrap.txt from the git version.

It's also possible to use the git proper, instead, but you'll need to follow the instructions in the docs to get it running. Once that's running for you, updates like this will be a trivial git pull command to get any non-released version.

Third option is to wait a few days for the next release. Should be this week some time...

technoblogy commented 1 week ago

Thank you - that was easy (I'm not very experienced with GitHub), and it compiles and uploads fine now. Just testing my application ...

technoblogy commented 1 week ago

Unfortunately the current position is that my Lisp interpreter fails when attempting to save the Lisp workspace from PSRAM to LittleFS on the RP2350.

The save function first compacts the workspace in PSRAM before saving it, to ensure that the objects in workspace will fit in the available flash, and this compact function seems to corrupt the workspace when it's in PSRAM. If I disable the compact function it saves to PSRAM successfully. If I test it with the workspace in normal RAM the compact, save, and load work fine.

The compact, save, and load functions work fine on other platforms with PSRAM, such as the ESP32.

I'll let you know if I manage to debug it, and thanks for the help so far, David

earlephilhower commented 1 week ago

PSRAM accesses are handled by the HW and are cached so to the CPU they should look like any other RAM address (just really slow if you spill the cache).

I'm not quite sure what you're doing w/the data structures, but if you can checkpoint in the middle and compare PSRAM to RAM versions you can use a binary search to winnow down where things differ (assuming you can guarantee the same state/order of operations between runs).

technoblogy commented 1 week ago

I'm getting erratic results. My MCVE test sketch definitely ran without error after installing your patch.

However, after running my Lisp interpreter and getting a hang, on a hunch I unplugged the Pimoroni Pico 2 board and plugged it back in, then tried uploading and running the MCVE test sketch again, and it gave Data error.

This seems pretty repeatable.

earlephilhower commented 1 week ago

Double check you've got the full patch. There's the source file and the wrap file, both of which are needed. PSRAM is controlled by only those 5 registers and the patch always saves/restores them around the ROM changing them for flash commands.

technoblogy commented 1 week ago

Yes, I've checked that I've got both files. Does my test sketch work fine for you, even after power cycling the board?

earlephilhower commented 1 week ago

Yes, the original sketch runs fine after upload and after an unplug/replug operation as well as the one I wrote to dump registers. This is on a Pimoroni PGA2350 board w/8MB PSRAM.

technoblogy commented 1 week ago

OK, thanks, I'll persevere with the debugging.

technoblogy commented 1 week ago

OK, I have a second MCVE test sketch that fails reliably for me:

#include <LittleFS.h>

const int len = 46000;
uint32_t Workspace[len] PSRAM;

bool checkdata () {
  for (int i=0; i<len; i++) {
    if (Workspace[i] != i) {
      Serial.print("First error at:"); Serial.print(i); Serial.println();
      return false;
    }
  }
  return true;
}

void setupdata () {
  for (int i=0; i<len; i++) Workspace[i] = i;
  if (checkdata()) Serial.println("Data setup OK");
}

void movedata () {
  for (int i=0; i<len/2; i++) {
    int temp = Workspace[i];
    Workspace[i] = Workspace[len-i-1];
    Workspace[len-i-1] = temp;
  }
}

void uselittlefs () {
  LittleFS.begin();
  File file;
  file = LittleFS.open("/DATA.DAT", "w");
  file.close();
}

void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.println("-----");
  setupdata();
  movedata(); 
  uselittlefs();
  movedata();
  if (!checkdata()) Serial.println("Data error"); else Serial.println("Data checked OK");
  delay(5000);
}

It calls movedata() to simulate the sorts of things that my Lisp interpreter is doing to compact the workspace. I get Data error printed out every time around the loop. If I comment out the two calls to movedata() it works fine. As before, if I remove the PSRAM attribute from Workspace it works fine.

If this works for you I can only think that there may be a difference in my configuration. I've got:

Have I missed any other changes to the core (which I've still got in a 4.1.1 folder)? Thanks, David

maxgerhardt commented 1 week ago

Usually you only do LittleFS.begin(); once in setup(). When you move it there, does the failure still occur?

technoblogy commented 1 week ago

Thanks for the suggestion, but I've just tried it and unfortunately it makes no difference. David

earlephilhower commented 1 week ago

This has nothing to do with LittleFS. I think it's actually a hardware (XIP cache?) issue.

I modified @technoblogy 's test to remove LittleFS and use a simple erase-block flash command and write all of PSRAM. It also checks the flipped data to help pinpoint when things go bad,

#include <hardware/flash.h>
extern "C" uint8_t _EEPROM_start;

const int len = 8192*1024 / 4;
uint32_t Workspace[len] PSRAM;

bool checkdata () {
  bool r = true;
  for (int i = 0; i < len; i++) {
    if (Workspace[i] != i) {
      Serial.print("First error at:"); Serial.print(i); Serial.printf(" = %08x", Workspace[i]); Serial.println();
      //return false;
      r = false;
    }
  }
  return r;
}
bool checkdatar () {
  bool r = true;
  for (int i = 0; i < len; i++) {
    if (Workspace[i] != len - i - 1) {
      Serial.print("First error at:"); Serial.print(i); Serial.printf(" = %08x", Workspace[i]); Serial.println();
      //return false;
      r = false;
    }
  }
  return r;
}

void setupdata () {
  for (int i = 0; i < len; i++) Workspace[i] = i;
  if (checkdata()) Serial.println("Data setup OK");
}

void movedata () {
  for (int i = 0; i < len / 2; i++) {
    int temp = Workspace[i];
    Workspace[i] = Workspace[len - i - 1];
    Workspace[len - i - 1] = temp;
  }
}

void flashop () {
  noInterrupts();
  rp2040.idleOtherCore();
  flash_range_erase((intptr_t)&_EEPROM_start - (intptr_t)XIP_BASE, 4096);
  rp2040.resumeOtherCore();
  interrupts();
}

void setup() {
  Serial.begin(9600);
  delay(5000);
}

void loop() {
  Serial.println("-----");
  setupdata();
  movedata();
  //checkdatar();
  flashop();
  if (!checkdatar()) Serial.println("RData error"); else Serial.println("RData checked OK");
  movedata();
  if (!checkdata()) Serial.println("1Data error"); else Serial.println("1Data checked OK");
  delay(5000);
}

This fails with multiple errors (it looks like certain words are jut not rewritten by movedata at random). If you unplug wait a few secs and replug in the board to do a full power cycle, the failing address differs. If it was code related I would have expected it to always differ at the same spots.

However, if you uncomment the checkdatar call, it passes all the time.

If it was PSRAM timings, then I would expect everything to fail after the 1st flash op.

If it was the flash bus being taken over in the middle, then a delay before the flash op should guarantee a quiescent state. Adding delay(500) before and after the flashop has no effect, though.

It may be cache flush related. I'm getting errors around the 4MB in this test, (i.e. the last PSRAM write region because we only go to len/2) while @technoblogy's rest shows errors around items which could still be in the cache, too.

Calling check would cause a cache spill and flush any dirty values out.

So, I think now we have an issue where the XIP cache is trying to write back during ROM flash write operations and getting lost...

earlephilhower commented 1 week ago

And yup, it's a cache issue. I'm getting hardware faults when I try and force the cache to write back using what I read in the datasheet (and as shown in https://forums.raspberrypi.com/viewtopic.php?t=375845 where the author has run into the exact same things we have!). But, if I just read 32K uints from psram I can get things to pass repeatably.

Need to examine what's up with the cache invalidation because that crashing doesn't make sense...it's a "cache clean" request and not an invalidate so even if it was the line I'm running in (and I'm running in RAM for my cache invalidation test) it's a safe operation...

technoblogy commented 6 days ago

Thanks for pursuing this! Reopen the issue?

earlephilhower commented 1 day ago

Sorry, I was busy elsewhere this past week. I've reopened this and posted something on the RPI forums (https://forums.raspberrypi.com/viewtopic.php?t=378249 ) so hopefully one of their engineers will come back with a solution.

Note that the "read 32K from PSRAM" to flush doesn't work in all cases...what if you just wrote the 1st 32K of PSRAM? Then no flush would happen.

earlephilhower commented 1 day ago

...but, I think we can read 32K of flash space instead to guarantee the clear. They both share the XIP cache.

earlephilhower commented 1 day ago

@technoblogy can you try the 1-file change in #2551? It works on simple test patterns, but your real app would be a much better test, I think, It's a hack and hopefully the RPI forums will get us a better solution. But, if not, I'd like to have this as backup...

technoblogy commented 16 hours ago

Thanks! First, to check, I've just changed the one file cores/rp2040/flash_wrapper.cpp. Is that correct?

I've tested my application with four key actions:

After eval(a) and eval(b), compact has nothing to do.

After eval(a), eval(b), and delete(a), compact has to shuffle b down in the workspace.

First, with workspace in PSRAM:

I tested various combinations of eval, delete, and compact, and got no corruption.

However, after eval(a), save, the workspace is corrupt. This seems odd because save is only reading the workspace.

With workspace in ordinary RAM:

All sequences work fine, with no corruption. David

earlephilhower commented 11 hours ago

Save is reading from PSRAM and writing to flash. The flash write is the bit that causes a full cache invalidate (not flush) which throws out any remaining PSRAM changes. It's possible the last bits of the PSRAM writes never made it to PSRAM from the cache even with the flash read addition here.

One thing you can try, @technoblogy , is to change the 48 * 1024 * 4 in that 1 file to 64 * 1024 * 4. Because this isn't a real cache flush it's more a matter of percentages. For example, at 32 * 1024 * 4 I got only 4 errors in the sequential test (vs 100s without the flush). So adding more pressure will give a higher %age of coverage at the cost of time. 64 can be 96 or 128, even.

technoblogy commented 11 hours ago

OK, I'll certainly try that.

I assume we're hoping that RPI will come up with a working flush() command to make this unnecessary?

technoblogy commented 9 hours ago

OK, sorry, I must have picked up the wrong version of that file because it doesn't have 48 * 1024 * 4 in it. I went to https://github.com/earlephilhower/arduino-pico/pull/2551 but I couldn't see how to download that file.

earlephilhower commented 9 hours ago

Look at the changed files, then click the "..." on the right hand side of the file name, and use menu item "view file". Then hit the "raw" button on the new page and you're all set.

technoblogy commented 9 hours ago

OK, got it, thanks!

technoblogy commented 9 hours ago

OK, now with the correct file it all seems to be working fine (I've left the original 48 * 1024 * 4).

I'll run my test suite on it, but so far it's looking good.

Thanks! David

technoblogy commented 6 hours ago

I've been using the flash_wrapper.cpp file with 48 * 1024 * 4.

I've been testing entering successively larger programs, saving them to flash using LittleFS, loading them back in, and checking for corruption.

The good news is that under no conditions have I now found any corruption.

However, with PSRAM the saving time is unacceptably long with large programs:

Program Size (bytes) Save time Load time
600 4 secs 1.5 secs
22816 165 secs 1.5 secs
48536 440 secs 1.5 secs

With normal RAM, saving and loading back in are virtually instantaneous even with the largest of these programs (max. 2 seconds).

Is this the effect of the cache flush? Unless a better solution can be provided, this will make it impractical to use LittleFS for program storage when using the PSRAM for the workspace.

earlephilhower commented 5 hours ago

Can you get any more granular timing? Writing and reading PSRAM is very slow by itself (I think it's around 40 PSRAM cycles per 64 bits), so while there may be some component being the PSRAM side and some the flash-read-to-flush-cache side. In any case the read-to-flush is only if/until a real fix from RPI or others appears.

Even the CircuitPython flush routine causes the chip to crash with heavy dirty loads, and they worked w/the pre-release RPI stuff...

technoblogy commented 5 hours ago

My experience is that running my Lisp benchmarks with the Lisp workspace in PSRAM rather than RAM approximately doubles their execution time.

In any case the read-to-flush is only if/until a real fix from RPI or others appears.

Can you estimate what sort of penalty the read-to-flush is causing?

Can you get any more granular timing?

I'll have a go. Can you give an example of what would be useful?

Thanks, David

earlephilhower commented 4 hours ago

@technoblogy download the latest change to the PR. A user on the RPI forums came up with a hack that seems to work and avoid crashing the CPU. IMHO the hack should not be needed, but whatever works... It should be faster than the flash read hack and passes the sequential testing we've got.

If you could report back with your LISP real-world tests that'd be awesome. #2551