adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.01k stars 1.19k forks source link

Unable to revive my Metro M4 Express #894

Closed notro closed 6 years ago

notro commented 6 years ago

Running the basics/gen_stack_overflow.py test using #893 has crashed my board so hard that I'm unable to revive it.

I have the latest bootloader from https://github.com/adafruit/uf2-samdx1/releases/tag/v2.0.0-adafruit.5 INFO_UF2.TXT:

UF2 Bootloader v2.0.0-adafruit.5 SFHWRO
Model: Metro M4 Express
Board-ID: SAMD51J19A-Metro-v0

I have used the Metro M4 Express (QSPI) flash erase from https://learn.adafruit.com/adafruit-metro-m4-express-featuring-atsamd51?view=all#old-way-for-the-circuit-playground-express-feather-m0-express-and-metro-m0-express

This is what the Raspberry Pi has to say about the usb device that shows up after erase:

pi@cp:~ $ dmesg
[764346.658642] usb 1-1.4: new full-speed USB device number 72 using dwc_otg
[764346.799653] usb 1-1.4: New USB device found, idVendor=239a, idProduct=8020
[764346.799668] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[764346.799678] usb 1-1.4: Product: Adafruit Metro M4
[764346.799686] usb 1-1.4: Manufacturer: Adafruit LLC
[764346.802011] cdc_acm 1-1.4:1.0: ttyACM0: USB ACM device

Next I enter the bootloader again and copy in: adafruit-circuitpython-metro_m4_express-3.0.0-beta.0.uf2

This is what the Raspberry Pi has to say:

pi@cp:~ $ dmesg
[764507.689306] usb 1-1.4: new full-speed USB device number 73 using dwc_otg
[764507.789314] usb 1-1.4: device descriptor read/64, error -32
[764508.009319] usb 1-1.4: device descriptor read/64, error -32
[764508.229309] usb 1-1.4: new full-speed USB device number 74 using dwc_otg
[764508.329306] usb 1-1.4: device descriptor read/64, error -32
[764508.549312] usb 1-1.4: device descriptor read/64, error -32
[764508.669408] usb 1-1-port4: attempt power cycle
[764509.329312] usb 1-1.4: new full-speed USB device number 75 using dwc_otg
[764509.769321] usb 1-1.4: device not accepting address 75, error -32
[764509.869322] usb 1-1.4: new full-speed USB device number 76 using dwc_otg
[764510.309333] usb 1-1.4: device not accepting address 76, error -32
[764510.309446] usb 1-1-port4: unable to enumerate USB device
dhalbert commented 6 years ago

Have you tried rebooting the RPi? Sometimes Linux gets completely confused about USB ports that have encountered errors on their devices.

You have a Metro M4 that has the word "BETA" on the silkscreen, right? The older boards had different pin configs. Did you use an update-bootloader...uf2 to update the bootloader? That will set the BOOTPROT fuses correctly to protect the bootloader against overwriting. Just loading it via a J-Link will not do that. Did you have a Metro M4 from the first batch? The first batch did not have the BOOTPROT fuses set.

notro commented 6 years ago

I have one with BETA on it, which I received a week ago, so I guess it's not from the first batch. I used update-bootloader-feather_m4-v2.0.0-adafruit.5.uf2. I did a reboot of the Pi now, didn't help. I've tried the board on a Windows computer as well.

The Pi has been quite resilient for me, I haven't rebooted in 2 weeks and had all kinds of board crashes and some flash erases of boards I have connected. It's been enough to re-plug the board to get going again.

dhalbert commented 6 years ago

Can you load Blink on it via the Arduino IDE? You'll need the latest Adafruit SAMD board support package, version 1.2.0? (And also the Arduino SAMD BSP.)

notro commented 6 years ago

Adafruit gave me a jlink with the boards and now I'm trying @tannewt's gdb guide. I have never used gdb before, in fact I don't think I've used a debugger in 25 years since university.

No matter what I do I always end up in _usart_async_set_irq_state().

pi@agl:~/circuitpython/workdirs/test/circuitpython/ports/atmel-samd$ /home/pi/opt/gcc-arm-none-eabi-7-2017-q4-major/bin/arm-none-eabi-gdb-py build-metro_m4_express/firmware.elf

(gdb) target extended-remote 192.168.10.175:2331
Remote debugging using 192.168.10.175:2331
0x00024778 in _usart_async_set_irq_state (device=0x20001c20 <heap+6144>, type=32, state=false) at asf4/samd51/hpl/sercom/hpl_sercom.c:624
624                     hri_sercomusart_write_INTEN_DRE_bit(device->hw, state);

(gdb) load
Loading section .text, size 0x39c64 lma 0x4000
Loading section .ARM.exidx, size 0x8 lma 0x3dc64
Loading section .data, size 0x418 lma 0x3dc6c
Start address 0x4000, load size 237700
Transfer rate: 177 KB/sec, 13982 bytes/write.

(gdb) break main
Breakpoint 1 at 0x1bbd0: file ../../main.c, line 236.

(gdb) monitor reset
Resetting target

(gdb) continue
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00024778 in _usart_async_set_irq_state (device=0x20001c20 <heap+6144>, type=32, state=false) at asf4/samd51/hpl/sercom/hpl_sercom.c:624
624                     hri_sercomusart_write_INTEN_DRE_bit(device->hw, state);

(gdb) step
^C
Program received signal SIGINT, Interrupt.
0x00024778 in _usart_async_set_irq_state (device=0x20001c20 <heap+6144>, type=32, state=false) at asf4/samd51/hpl/sercom/hpl_sercom.c:624
624                     hri_sercomusart_write_INTEN_DRE_bit(device->hw, state);

(gdb) print *device
$1 = {usart_cb = {tx_byte_sent = 0x0, rx_done_cb = 0x0, tx_done_cb = 0x0, error_cb = 0x0}, irq = {handler = 0x0, parameter = 0x171}, hw = 0x0}
dhalbert commented 6 years ago

Use the bt (backtrace) command to get a stack trace to see what's calling _usart_async_set_irq_state.

Did you try Arduino blink?

notro commented 6 years ago

I haven't tried Arduino blink since I won't be using Arduino in the future, so I though I'd try gdb since that was something I might have use for later. But I'll try it if this is a dead end.

(gdb) bt
#0  0x00024778 in _usart_async_set_irq_state (device=0x20001c20 <heap+6144>, type=32, state=false) at asf4/samd51/hpl/sercom/hpl_sercom.c:624
#1  0x20001c20 in heap ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
dhalbert commented 6 years ago

I'm just trying to figure out if you have a hardware problem or a software problem. First was to see if the bootloader is messed up in some way, but it seems like you used the standard update-bootloader...uf2 to make sure it's OK. Also we need to check whether the firmware image you're trying to upload is broken is some way. Did you build it yourself or did you download it from a github release? You might try the alpha.6 .uf2 as well, and/or redownload the beta.0 .uf2.

The idea with Arduino is that it's depending on nothing but the bootloader to upload a simple program. If that doesn't work, then the problem is not circuitpython but either hardware or the bootloader. It doesn't use the SPI flash chip either. So I'm trying to divide and conquer here to narrow where the problem is.

dhalbert commented 6 years ago

Oops, oops:

I used update-bootloader-feather_m4-v2.0.0-adafruit.5.uf2

You need to use update-bootloader-metro_m4-v2.0.0-adafruit.5.uf2. There are many pin differences between the two boards.

sommersoft commented 6 years ago

Nice catch @dhalbert. I was about to ask/suggest using different USB cables. The messages and print *device output looked like it was a hardware problem to me. Incorrect pins would definitely manifest the same way...

notro commented 6 years ago

The bootloader was a copy-paste error on my part, I accidentally used the feather version first, but the leds didn't blink, so I discovered my error. This is the INFO_UF2.TXT:

UF2 Bootloader v2.0.0-adafruit.5 SFHWRO
Model: Metro M4 Express
Board-ID: SAMD51J19A-Metro-v0

Wrt cables, I've used diferrent ones when switching between the Pi and a Windows computer.

I'm having problems getting the Arduino IDE up and running, I've followed these steps, but no Adafruit boards: https://learn.adafruit.com/adafruit-metro-m4-express-featuring-atsamd51/setup

I got this error: Warning: non trusted contribution, skipping script execution (C:\Users\noralf\Documents\ArduinoData\packages\adafruit\hardware\samd\1.2.0\post_install.bat) So I ran it with admin rights, but no Adafruit boards shows up. The Arduino SAMD boards is there.

I did install the Windows Store version. I'm on Windows 10.

I have this file that shows the boards: C:\Users\noralf\Documents\ArduinoData\packages\adafruit\hardware\samd\1.2.0\boards.txt

sommersoft commented 6 years ago

Well, good to know that correct bootloader and cables can be ruled out. I think. 😄

I just loaded Arduino IDE. Had to update my BSP to 1.2.0. I did get the "Warning: non trusted" message after the update ran. Closed the IDE, re-opened (normal, non-admin), and the M4 boards showed up for me. Also on Win10, with the Windows Store version of the IDE.

2018-06-03

Didn't attempt an upload, however.

notro commented 6 years ago

Finally the Arduino IDE showed me the M4 and the blink sketch is indeed working. Next I tried adafruit-circuitpython-metro_m4_express-3.0.0-beta.0.uf2 once more, but no luck.

sommersoft commented 6 years ago

@notro can you retrieve the fuse settings from the board? I don't know how in gdb; I use Atmel Studio on Windows.

This is what my Metro M4 Beta looks like:

AC_BIAS0 = 0x00
ADC0_BIASCOMP = 0x00
ADC0_BIASREFBUF = 0x00
ADC0_BIASR2R = 0x00
ADC1_BIASCOMP = 0x00
ADC1_BIASREFBUF = 0x00
ADC1_BIASR2R = 0x00
USB_TRANSN = 0x00
USB_TRANSP = 0x00
USB_TRIM = 0x00
ROOM_TEMP_VAL_INT = 0x00
ROOM_TEMP_VAL_DEC = 0x00
HOT_TEMP_VAL_INT = 0x00
HOT_TEMP_VAL_DEC = 0x00
ROOM_INT1V_VAL = 0x00
HOT_INT1V_VAL = 0x00
ROOM_ADC_VAL_PTAT = 0x00
HOT_ADC_VAL_PTAT = 0x00
ROOM_ADC_VAL_CTAT = 0x00
HOT_ADC_VAL_CTAT = 0x00
BOD33_DIS = [X]
BOD33USERLEVEL = 0x1C
BOD33_ACTION = 0x01
BOD33_HYST = 0x02
BOD12_DIS = [ ]
BOD12USERLEVEL = 0x0D
BOD12_ACTION = 0x01
BOD12_HYST = [X]
NVMCTRL_BOOTPROT = 0x0D
NVMCTRL_SEESBLK = 0x00
NVMCTRL_SEEPSZ = 0x00
RAMECC_ECCDIS = [X]
WDT_ENABLE = [ ]
WDT_ALWAYSON = [ ]
WDT_PER = 0x0B
WDT_WINDOW = 0x0B
WDT_EWOFFSET = 0x0B
WDT_WEN = [ ]
NVMCTRL_REGION_LOCKS = 0xFFFFFFFF

SW0_WORD_0 = 0x00 (unknown)
SW0_WORD_1 = 0x00 (unknown)
TEMP_LOG_WORD_0 = 0x00 (unknown)
TEMP_LOG_WORD_1 = 0x00 (unknown)
TEMP_LOG_WORD_2 = 0x00 (unknown)
USER_WORD_0 = 0xF69A9239 (valid)
USER_WORD_1 = 0xAEECFF80 (valid)
USER_WORD_2 = 0xFFFFFFFF (valid)
notro commented 6 years ago

Some progress, but I can't break on supervisor_get_serial_connected():

(gdb) break _usart_async_set_irq_state
Breakpoint 4 at 0x2476c: file asf4/samd51/hpl/sercom/hpl_sercom.c, line 623.
(gdb) monitor reset
Resetting target
(gdb) continue
Continuing.

Breakpoint 4, _usart_async_set_irq_state (device=0xb, type=USART_ASYNC_BYTE_SENT, state=8) at asf4/samd51/hpl/sercom/hpl_sercom.c:623
623             if (USART_ASYNC_BYTE_SENT == type || USART_ASYNC_TX_DONE == type) {
(gdb) bt
#0  _usart_async_set_irq_state (device=0xb, type=USART_ASYNC_BYTE_SENT, state=8) at asf4/samd51/hpl/sercom/hpl_sercom.c:623
#1  0x000283d4 in time_sleep (seconds_o=<optimized out>) at ../../shared-bindings/time/__init__.c:82
#2  0x2002fd28 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) b time_sleep
Breakpoint 7 at 0x28398: file ../../shared-bindings/time/__init__.c, line 71.
(gdb) monitor reset
Resetting target
(gdb) continue
Continuing.

Breakpoint 7, time_sleep (seconds_o=0xf0) at ../../shared-bindings/time/__init__.c:71
71      STATIC mp_obj_t time_sleep(mp_obj_t seconds_o) {
(gdb) bt
#0  time_sleep (seconds_o=0xf0) at ../../shared-bindings/time/__init__.c:71
#1  0x00028326 in supervisor_get_serial_connected (self=0xf0) at ../../shared-bindings/supervisor/Runtime.c:66
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) b supervisor_get_serial_connected
Breakpoint 8 at 0x28324: file ../../shared-bindings/supervisor/Runtime.c, line 66.
(gdb) monitor reset
Resetting target
(gdb) continue
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00024778 in _usart_async_set_irq_state (device=0x1, type=5, state=2) at asf4/samd51/hpl/sercom/hpl_sercom.c:624
624                     hri_sercomusart_write_INTEN_DRE_bit(device->hw, state);

I'll see I can get to the fuse settings.

notro commented 6 years ago

@sommersoft AFAICT we got the same values. I tried your? Atmel Studio guide yesterday so I already had it installed. Really nice with these Adafruit guides, saves me from having to chase down all the details :-)

AC_BIAS0 = 0x00
ADC0_BIASCOMP = 0x00
ADC0_BIASREFBUF = 0x00
ADC0_BIASR2R = 0x00
ADC1_BIASCOMP = 0x00
ADC1_BIASREFBUF = 0x00
ADC1_BIASR2R = 0x00
USB_TRANSN = 0x00
USB_TRANSP = 0x00
USB_TRIM = 0x00
ROOM_TEMP_VAL_INT = 0x00
ROOM_TEMP_VAL_DEC = 0x00
HOT_TEMP_VAL_INT = 0x00
HOT_TEMP_VAL_DEC = 0x00
ROOM_INT1V_VAL = 0x00
HOT_INT1V_VAL = 0x00
ROOM_ADC_VAL_PTAT = 0x00
HOT_ADC_VAL_PTAT = 0x00
ROOM_ADC_VAL_CTAT = 0x00
HOT_ADC_VAL_CTAT = 0x00
BOD33_DIS = [X]
BOD33USERLEVEL = 0x1C
BOD33_ACTION = 0x01
BOD33_HYST = 0x02
BOD12_DIS = [ ]
BOD12USERLEVEL = 0x0D
BOD12_ACTION = 0x01
BOD12_HYST = [X]
NVMCTRL_BOOTPROT = 0x0D
NVMCTRL_SEESBLK = 0x00
NVMCTRL_SEEPSZ = 0x00
RAMECC_ECCDIS = [X]
WDT_ENABLE = [ ]
WDT_ALWAYSON = [ ]
WDT_PER = 0x0B
WDT_WINDOW = 0x0B
WDT_EWOFFSET = 0x0B
WDT_WEN = [ ]
NVMCTRL_REGION_LOCKS = 0xFFFFFFFF

SW0_WORD_0 = 0x00 (unknown)
SW0_WORD_1 = 0x00 (unknown)
TEMP_LOG_WORD_0 = 0x00 (unknown)
TEMP_LOG_WORD_1 = 0x00 (unknown)
TEMP_LOG_WORD_2 = 0x00 (unknown)
USER_WORD_0 = 0xF69A9239 (valid)
USER_WORD_1 = 0xAEECFF80 (valid)
USER_WORD_2 = 0xFFFFFFFF (valid)
sommersoft commented 6 years ago

Yeah, looks identical to me.

Do you have a code.py/main.py running on the board? runtime.serial_connected/supervisor_get_serial_connected shouldn't be called, otherwise. To the best of my knowledge, at least.

I just ran it on the current tip of master, and I get no errors.

dhalbert commented 6 years ago

Here is a Metro M4 build that puts CIRCUITPY on the internal flash instead of the external SPI flash chip, to rule that in or out. mm4-internal-flash-beta.0.uf2.zip.

Also, did you try https://github.com/adafruit/circuitpython/releases/download/3.0.0-alpha.6/adafruit-circuitpython-metro_m4_express-3.0.0-alpha.6.uf2 yet?

I would also try another USB cable, even if that one seems OK.

notro commented 6 years ago

That gave me an idea, I commented out the maybe_run_list() code in start_mp() and now I have access to the REPL and CIRCUITPY drive. I deleted code.py and restored start_mp() and voila, bak in business!

Thanks @sommersoft and @dhalbert for your help!

This means that the flash eraser I used didn't actually work.

This is the offending code.py, it's tests/basics/fun_error2.py:

# test errors from bad function calls
try:
    enumerate
except:
    print("SKIP")
    raise SystemExit

def test_exc(code, exc):
    try:
        exec(code)
        print("no exception")
    except exc:
        print("right exception")
    except:
        print("wrong exception")

# function with keyword args not given a specific keyword arg
test_exc("enumerate()", TypeError)
dhalbert commented 6 years ago

Hmm, yes, the flash eraser was never tested on the M4, and probably won't work because it's QSPI and various other reasons. So perhaps we should have some way of forcing safe mode in case of really broken main.py/code.py or boot.py.

Glad you/we solved this! Thanks @sommersoft!

notro commented 6 years ago

Hmm, yes, the flash eraser was never tested on the M4, and probably won't work because it's QSP

I used a M4 QSPI eraser, see the first post.

notro commented 6 years ago

And now I know why I had to flash erase the Feather M0 Express sometimes when I ran the tests working on #893, it was probably the test I was running in code.py that crashed it.

dhalbert commented 6 years ago

I tested the tip of Adafruit_SPIFlash on the Metro M4, and it does indeed not work. I'll submit an issue about this and/or try to fix it.

tannewt commented 6 years ago

Has anyone figured out why the test is so bad? This is what I was hitting with Rosie as well! We should definitely fix for 3.0.0.

@notro mind filing an issue for the test failure itself? Thanks!

DPiero commented 5 years ago

@sommersoft . how did you read the fuses .... I if I try I have this error. fuseserror

dhalbert commented 5 years ago

@DPiero That offset looks incorrect. What fuses are you trying to read? Here's a short Arduino program I wrote in the past to read the SAMD51 BOOTPROT fuse value:

void setup() {
  // initialize serial communication at 9600 bits per second:
  Serial.begin(9600);
  while (!Serial) ;
}

void loop() {
  // Serial.println(NVMCTRL->STATUS.bit.BOOTPROT, HEX);
  Serial.println(*(uint8_t*) (0x41004000 + 0x12 + 1), HEX);
  delay(1000);
}
sommersoft commented 5 years ago

@DPiero, I've only ever used a SEGGER J-Link with Atmel Studio, so I am not much help with using the Atmel-ICE. Are you able to program the chip succesfully?

I did some searching, and could only find this as a similar problem to possibly help: https://community.atmel.com/forum/getting-started-ice-and-samd21g18 Only other thing I could mention, is to make sure you have AS7 up-to-date; the chip definitions for the memory locations may have been wrong in the past.

@dhalbert, when reading the fuses in Atmel Studio, it reads all of the available fuses based on the selected chip. For whatever reason, it seems that it fails on trying to read the USER_WORD_X for them. But, that is a useful little routine!

DPiero commented 5 years ago

Thank you so much... the arduino code returns 0x0F.
I do not know what to do, I installed a new uC and loaded the ADAFRUIT bootloader. both the Atmel Studio version and the ATMEL-ICE sw are the latest available. I tried on 3 PCs and 2 ATMEL-ICE. At this point I try to get a SEGGER J-LINK.

dhalbert commented 5 years ago

0x0f means the bootloader is not protected. I'm not sure what your goal is here. If you want to make the bootloader protected, now that you have a bootloader loaded, just get the appropriate update_bootloader...uf2 file from here: https://github.com/adafruit/uf2-samdx1/releases/latest, and copy it onto the ...BOOT drive. That will rewrite the bootloader (it may be identical) and then that program will also set the BOOTPROT fuse.

Or are you trying to read the other fuse values for another reason?