adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.13k stars 1.22k forks source link

CIRCUITPY doesn't mount properly when there are multiples of a device type on USB macOS 13.1 #7409

Closed anecdata closed 1 year ago

anecdata commented 1 year ago

CircuitPython version

CircuitPython version doesn't seem to matter.

Mostly I've tested with 8.0.0-beta.6 and 7.3.3, 
but I've added devices with versions all the way back to 5.3.1.

Code/REPL

No code needed.

Behavior

I've tested Pico W, QT Py ESP32-S2, Feather M4, and RP2040. In each case, the first one or two will mount, then subsequent similar devices won't mount at all or won't mount properly: They don't show up in Finder. They may not show up in ls -l /Volumes, and if they do, they may not show up properly.

Here's a case where (4) Pico W were connected to USB after flash-nuking them and installing 8.0.0-beta.6:

drwx------   1 a     staff  16384 Dec 31  1969 CP198PCOW
d--x--x--x   2 root  wheel     64 Jan  2 14:11 CP201PICOW

(the 1st is good, the 2nd is bad, the other two are missing - but that's the general pattern: when a device doesn't mount correctly, it's either not in the listing or has the "bad" file characteristics)

This seems to be largely a macOS 13.1 issue, but I've "fixed" it in two different ways that change the flash image:

  1. I had 4 new QT Py ESP32S2 with 8.0.0-beta.6 installed. Thinking it might be a beta issue, I reverted them to 7.3.3 and they mounted, and continued to mount after re-updating to 8.0.0-beta.6. And they continued to mount after a storage.erase_filesystem(). In one case so far, this method didn't work, but the next method did:

  2. In two cases (QT Py ESP32-S2), I extended the filesystem using storage.erase_filesystem(extended=True), and these two devices have mounted every time since, regardless of other similar devices already mounted or connected on USB. Block size changes in this case from 512 to 1024 according to os.statvfs("/"), but there doesn't seem to be a correlation to block size, or to FAT12 vs. FAT16.

I'm at a loss with Raspberry Pi Pico W, since they only run 8.0.0-beta.x and have no capacity to extend filesystem. I've successfully mounted the same (4) Pico W on a Raspberry Pi 4 simultaneously (though couldn't rename the drives).

Before macOS 13.1, this wasn't an issue. Sometimes devices would unmount and stay unmounted after a hardfault or software reset, but a hard reset or power cycle almost always brought them back.

I've been testing on two different Macs, both Ventura 13.1, one was a relatively recent clean install. I've tried USB-A ports, USB-C ports, and various hubs, that doesn't seem to make much difference. I can't really correlate this to much other 13.1 + multiple similar devices (same CircuitPython port and usually same board).

Code on all of these boards runs, and serial console works. I tried a boot.py that turned off HID and MIDI, but it made no difference.

This may seem to be a "doctor it hurts when I do this" issue (Dr.: "don't do that"), but being able to have multiple devices active lets me run long-term tests to track down intermittent issues, and compare behavior on a number of fronts across ports and espressif board variants (e.g., S2 vs. S3). Almost from the start of CircuitPython, I've had multiple devices connected simultaneously (usually a dozen or more), often many of the same type.

Description

Data on degree of mounting success using the following tools: Finder # bottom line of whether a device is useful ls -al /Volumes # sometimes devices will show up here (sometimes incorrectly) but not in Finder system_profiler -json SPUSBDataType # useful to show if there's a mount point diskutil list # devices seem to always show up correctly here discotool # device.volume doesn't always correlate to either Finder or /Volumes dmesg # unfortunately cryptic and variable results

Additional information

Some previous discussion on Discord: https://discord.com/channels/327254708534116352/537365702651150357/1059572042552311949

dhalbert commented 1 year ago

I wonder if there was somehow some macOS "memory" of the previous use of these particular boards, and reinstalling somehow overrode that memory. On Windows it's not too hard to clear the list of previously mounted drives. I don't know if there's anything similar for macOS.

anecdata commented 1 year ago

CircuitPython devices leave a trace in System Settings --> Network, but they can be cleared out (Neradoc has a script to do that more easily), and I did that to no avail. There may be other traces.

Testing across two Macs gives similar behavior. And it doesn't matter which of several boards are plugged in first, it's the later ones that don't mount properly. But once they have been "fixed", I can even put them on their own hub and plug the hub in and they'll all mount fine simultaneously.

I'd say this is largely a 13.1 issue, but there does seem to be an interaction with some characteristic of the CIRCUITPY disk image, perhaps even a random byte in something related to USB that gets overridden when the filesystem is changed.

This is really odd behavior, and makes it hard to do development and testing. Do others regularly have multiple (3-4 or more) of the same board plugged into 13.1 without issue?

dhalbert commented 1 year ago

One thing to test might be if you see the same issue if you plug in several USB sticks formatted with the same volume name. If it's not FAT12 related, then we need to figure out what is special about our USB drives.

anecdata commented 1 year ago

I'll try some USB flash drives, working on getting a few identical ones.

It doesn't seem to be a FAT12 vs. FAT16 issue. Or a block size issue (same with 512 and 1024 boards). Using (4) Adafruit Feather RP2040, no other storage devices connected to the Mac (just the display), processed each as follows.

One at a time:

>>> os.statvfs("/")
(1024, 1024, 7137, 7133, 7133, 0, 0, 0, 0, 255)

(OK)

% ls -l /Volumes
drwx------   1 a     staff  16384 Dec 31  1969 CIRCUITPY

(OK)

Disk Utility & Finder: 7.3MB FAT16 (OK)

Plug in each board (edit: directly into a USB-C socket w/o external hub), leaving 5-10 seconds between. Only the first one mounts properly:

drwx------   1 a     staff  16384 Dec 31  1969 CIRCUITPY
d--x--x--x   2 root  wheel     64 Jan  3 16:03 CIRCUITPY 1
d--x--x--x   2 root  wheel     64 Jan  3 16:03 CIRCUITPY 2
d--x--x--x   2 root  wheel     64 Jan  3 16:03 CIRCUITPY 3

This was on an M1 Mac, which is relatively new and was cleanly installed (no system detritus imported from prior Mac). The behavior is the same (only the first CircuitPython device mounts properly) on an Intel Mac.

jepler commented 1 year ago

This is plugging in multiple devices simultaneously, not sequentially?

anecdata commented 1 year ago

Sequentially, 5-10 seconds between.

edit: if / when boards are "fixed", they can be plugged in simultaneously and will mount fine.

anecdata commented 1 year ago

I tried installing 7.3.3 on each, but same behavior. Even flash-nuking then installing 7.3.3 on each didn't work. Still, only the first inserted mounts properly.

I've been able to "fix" QT Py ESP32-S2 boards (the two methods in the original comment), but not Pico W or RP2040.

jepler commented 1 year ago

I notice on Linux that all CIRCUITPY drives end up with the same "volume serial number", 5021-0000, indicating the date of 2020-1-1. This means that when operating on drives "by uuid", there is confusion when two circuitpython devices are plugged in. (Tested with two pico Ws I had handy)

It looks like oofatfs tries to use the current timestamp to generate a serial number, which is obviously not working out for us:

            st_dword(buf + BS_VolID, GET_FATTIME());    /* VSN */

Due to this I prepared #7410.

anecdata commented 1 year ago

One thing to test might be if you see the same issue if you plug in several USB sticks formatted with the same volume name.

The smallest matched set of USB sticks I could conjure up was (4) new / unused, ostensibly identical, 4GB FAT32 drives, no problem mounting them all on either Mac, even through a hub, even when other storage devices are already mounted.

anecdata commented 1 year ago

Build from #7410, while probably a Good Thing, doesn't help the non-mounting issue. I loaded it up on (4) Pico W, and as usual only the first board plugged in mounted properly:

d--x--x--x   2 root  wheel     64 Jan  3 20:30 CP198PICOW
drwx------   1 a     staff  16384 Jan  3 20:24 CP199PICOW
d--x--x--x   2 root  wheel     64 Jan  3 20:30 CP200PICOW
d--x--x--x   2 root  wheel     64 Jan  3 20:30 CP201PICOW

I wonder if it could be something like a bad pointer or buffer boundary in USB-facing code, that just happens to get overwritten into an innocuous form when certain changes are made to the bin/uf2 image.

The two "fix" methods above only work on some kinds of boards (QT Py ESP32-S2, so far). I was not able to "fix" a set of espressif DevKitC-1-N4R2 by either erasing the flash, extending the flash, or re-flashing with 7.3.3.

jepler commented 1 year ago

You erase_filesystem'd the devices while running with #7410?

anecdata commented 1 year ago

For the above, I simply installed the PR build on each then ejected and unplugged each, then tried to successively plug them in. I can try other sequences if you think it might help. They were all still in a "clean" state of flash-nuking + installing 8.0.0-beta.6 from the original test.

BTW, I do have two other Pico W running 8.0.0-beta.6 that can both be mounted simultaneously. I'm hesitant to mess with those since they are very useful to me that way, but I can if needed. If my previous hypothesis is correct, there would be some difference of flash contents over time that let these work but not others with different history. Of course, that all could be way off ;-) (flash block writes being what they are, and all)

anecdata commented 1 year ago

OK, I gotcha. The volume_uuid were the same previously. Performed storage.erase_filesystem on each, verified the new IDs are different from the old and unique from each other. After a few unplug-all then replug-all, they all four mounted properly. Took a few times, quite possibly some stored state on the Mac. w00t!

The two I mentioned that do mount are standard 8.0.0-beta.6: one has the default volume_uuid, the other is different (not sure how that happened).

So from the factory, devices are likely to have the same volume_uuid because... the flashing or first-boot process takes the same amount of time up to when the ID is generated from the time? Any thoughts on the inconsistent behavior of the "fix" - the first didn't require erasing the filesystem, just downgrading to 7.3.3?

jepler commented 1 year ago

reading the source, it looks (previous to #7410 ) like the device's RTC time when you storage.erase_filesystem() will influence the resulting VolID. For instance, at apparently 16 seconds after boot, I ran erase_filesystem and got 5021-0008 instead of -0000. But when the device is blank (for instance, because it was nuked) the amount of time until it reaches the filesystem creation step will I guess be consistently under 2 seconds, and if so it'll be 5021-0000 consistently.

jepler commented 1 year ago
anecdata commented 1 year ago

Closing, fixed by storage.erase_filesystem() with unique volume_uuid. See #7410.

rsmets commented 11 months ago

I am running into this same issue on mac os 13.4.1. Flashing nrf52840s directly from factory. However, given I using a newer version of Circuit Python, 8.2.9, I feel this should be resolved, right?

@anecdata your solution doesn't fully make sense looking at the other issue thread #7410. Would you mind shedding some light on it?

Closing, fixed by storage.erase_filesystem() with unique volume_uuid. See https://github.com/adafruit/circuitpython/pull/7410.

Much appreciated for the help!

dhalbert commented 11 months ago

@rsmets did you do a storage.erase_filesystem() on the board? That will rewrite the volume id.

rsmets commented 11 months ago

Thanks @dhalbert for the suggestion. I have not tried that yet because the volumes are not showing up at all unless in the u2f bootloader mode. I suppose the fix is to do this via REPL on the boards that are appearing as a circuit python volume, in order to give the ones that aren't showing up a shot?

dhalbert commented 11 months ago

@rsmets even if the volumes are not appearing, I think the serial will appear, and you can do storage.erase_filesystem() from there. You can also try on a host computer that is not a Mac.

anecdata commented 11 months ago

you may also be able to access the CIRCUITPY drive by rebooting the Mac with no other CircuitPython devices plugged in

rsmets commented 11 months ago

Thanks for both of your assistance here.

Indeed, I have a serial interface into the boards that do not appear as volumes. And can successfully run storage.erase_filesystem(). However, upon trying to "save" via the Mu editor, I am met with a Disk Error.

Error saving file to disk. Ensure you have permission to write the file and sufficient disk space.

Also, no luck restarting my Mac machine.

Upon trying on a Windows machine same story. Not showing up as a removal drive (despite hearing the sound of the removal drive being connected and disconnected) but can connect over serial.

Very bizarre!

FWIW, this is the warning I get over serial interface:

Running in safe mode! Not running saved code.

You are in safe mode because:
CIRCUITPY drive could not be found or created.

Also, it is worth mentioning that re-flashing with the hex bootloader followed by re-installing circuit python over u2f does not change anything.

Here is the u2f bootloader info for good measure:

UF2 Bootloader 0.8.0 lib/nrfx (v2.0.0) lib/tinyusb (0.12.0-145-g9775e7691) lib/uf2 (remotes/origin/configupdate-9-gadbb8c7)
Model: Adafruit ItsyBitsy nRF52840 Express
Board-ID: nRF52840-ItsyBitsy-revA
Date: Sep 29 2023
SoftDevice: S140 6.1.1

From digging through the Discord it seems this error is chalked up to a HW problem, but hard to believe it's a "HW problem" since it started happening only after successful bootloader flashes. I had CircuitPython up and running on nearly all the boards before most of most of started not showing up.

dhalbert commented 11 months ago

You mentioned a few posts back that CIRCUITPY is not appearing on multiple disparate boards, not just the nRF52840. Is the CIRCUITPY drive could not be found or created only on one board or multiples? There may be multiple issues here.

If you do storage.erase_filesystem() on the Windows machine for each different board, and then wait for the board to reset and come back, does CIRCUITPY reappear on Windows? Have only one board at a time plugged in. After that result, let's talk about macOS. Also, it appears you haven't updated Ventura beyond 13.1 to Ventura? I think the latest is 13.6.3.

rsmets commented 11 months ago

@dhalbert very grateful for your willingness to help!

You mentioned a few posts back that CIRCUITPY is not appearing on multiple disparate boards, not just the nRF52840

I actually have multiple custom PCBs that all use the same nRF52480 mcu. This project was prototyped with the ItsyBitsy so figured might as well run with the nRF52840 :)

If you do storage.erase_filesystem() on the Windows machine for each different board, and then wait for the board to reset and come back, does CIRCUITPY reappear on Windows?

Okay, will try storage.erase_filesystem() on windows later today and report back!

Also, it appears you haven't updated Ventura beyond 13.1 to Ventura? I think the latest is 13.6.3.

My mac machine is currently running Ventura 13.4.1. I could bump minor versions very easily if you think that would help. Although, I agree let's talk windows first.

dhalbert commented 11 months ago

In the original post:

I've tested Pico W, QT Py ESP32-S2, Feather M4, and RP2040. In each case, the first one or two will mount, then subsequent similar devices won't mount at all or won't mount properly: They don't show up in Finder. They may not show up in ls -l /Volumes, and if they do, they may not show up properly.

Is that still the case with the non-nRF boards after you updated and did storage.erase_filesystem()? (Windows and/or macOS).