d-ronin / dRonin

The dRonin flight controller software.
http://dronin.org
Other
289 stars 167 forks source link

bootloader-updater not working for revolution target #1689

Closed henn1001 closed 7 years ago

henn1001 commented 7 years ago

Issue details: I was trying to update the bootloader on a airbot f4 because i reconfigured the alarm led to buzzer (to make it shut up while booting or flashing :p). I noticed that the bu-updater didn't work. After flashing & pressing boot, it just boots into apparently nothing - no heartbeat, nothing.

I retested it on another board cloner board ( "cc3d revolution"). Same outcome. So i tried a different board and for dtfc and sparky2 it´s updating the bootloader as expected.

While testing older releases i found that the last one where it´s working, is the Samsara release.

Maybe someone can cross check it :)

pug398 commented 7 years ago

@henn1001 I have a revolution I can check it on but was wondering if you have flashed ef_revolution.hex to the board (working?) and then flashed bu_revolution.tlfw to the board (still working?) both from the current build? (I take it you mean the bootloader is corrupt and you cannot rescue)

henn1001 commented 7 years ago

I flashed the ef_revolution.hex, yes. The bootloader on that combination is running fine. I can rescue and update firmware just fine and everything is working. It´s just the bu_revolution that is not working.

The bu does nothing. After i flash the bu and reboot it just boots into an empty loop or something alike. I can still rescue though and flash a normal firmware - the bootloader is not getting corrupt.

pug398 commented 7 years ago

Ok so yes I have had to flash the main code back after flashing bu updater on revolution. My understanding was bu updater loads into main memory and extracts and writes the bootloader from there. I assume it is overwriting key part of main code. Whether that is a bug or just the process I am not sure.

henn1001 commented 7 years ago

Yes that is expected. That bu code basically replaces the "main code" i.e. flight firmware. So after you flash the bu, you reboot the controller. From there the old bootloader jumps into the new application, in this case the bootloaderUpdater. The updater overwrites the old bootloader and replaces it with the new one. After this you need to reflash a normal flight fw_

So did it work for you? You need to check if the bootloader version changes in the recovery screen.

pug398 commented 7 years ago

Yes it worked for test in that I flash the bu_revolution.tlfw and then boot. I then disconnect, "rescue" and flash fw_revolution.tlfw. Reboot and all is well. Bootloader version is at 87 after flash. I have always had to write main flight firmware back after updating bootloader and have never had an issue. Except on seldom occasion when flashing goes wrong and I have to write entire bin back to it to recover bootloader.

Updating other fc bootloader (quanton, sparky) behave slightly different. Upon flashing bu_xxx.tlfw to those targets the board boots into the new bootloader and is ready to flash fw_xxx.tlfw without a "rescue" step. Perhaps that is what you are referring to?

mlyle commented 7 years ago

Direct-flashing bl_... from gcs shouldn't work to upgrade bootloader.

Direct-flashing bu_... from gcs updates the bootloader and replaces FW. There's been a lot of reports it doesn't work right on revolution lately. I need to bisect and track it down, but haven't had time.

pug398 commented 7 years ago

Meant bu_xxx.tlfw :)

I think it is the extra rescue step that is catching peoples attention since other platforms don't require it. It is same bl version 85 but changed between samsara.1 and quixote.

20160723 build ok 20160926 not ok

yds commented 7 years ago

To replicate install the LibrePilot bu_revolution.opfw then install the dRonin bu_revolution.tlfw and the booloader stays at the LibrePilot version 6. The only way to get the dRonin bootloader ver 87 installed is via a DFU method.

Also discussed on the forum.

tracernz commented 7 years ago

Chances are it's this: https://github.com/tracernz/dronin/blob/next/flight/targets/bu/common/main.c#L67

I'd rather not remove the only check that stops you flashing the wrong bootloader to the wrong board. Perhaps we could do the check in GCS instead before it even lets you flash the bu... and gives you an option to continue anyway if you know what you're doing.

mlyle commented 7 years ago

@tracernz So the actual values should be the same-- checked the board types, revisions, etc between LP and our trees. And the structure is the same and should be at the same offset. Every time I've looked at this I've been puzzled, but I've not dumped memory or debugged to compare.

tracernz commented 7 years ago

Can confirm the same. I'm struggling to see where it could error out without at least erasing the existing BL.. Needs debugger.

tracernz commented 7 years ago

Hmmm... something off here...:

(gdb) bt
#0  PIOS_USBHOOK_USR_Init ()
    at /home/mike/dev/LibrePilot/flight/pios/stm32f4xx/pios_usbhook.c:278
#1  <signal handler called>
#2  0x08028fd0 in pios_flash_chip_internal ()
#3  0x08020714 in _main ()
    at /home/mike/dev/dronin2/flight/PiOS/STM32F4xx/startup.c:85

We have a PC in .rodata (pios_flash_chip_internal)...

tracernz commented 7 years ago

Single-stepping it, things go bad at:

    /* copy initialised data from flash to RAM */
    memcpy(&_sdata, &_sidata, &_edata - &_sdata);
mlyle commented 7 years ago

Oh shit, librepilot leaves usb interrupts running when it branches to us.

mlyle commented 7 years ago

oh maybe that's not the cause-- just what i eyeballed. yes, rodata doesn't seem awesome.

yds commented 7 years ago

FWIW, the dRonin bu_revolution.tlfw fails with the bu_RevoMini_TauLabs_v85.opfw from the forum thread the same as it does with the LibrePilot bu_revolution.opfw.

This bug is mostly a "showstopper" for the RevoMini which has no SBL boot pads and no simple way to get it into DFU mode.

CopterFail commented 7 years ago

Does this mean it is actually not recommended to change to dRonin for a RevoMini, running an older LibrePilot (incl BL)?

mlyle commented 7 years ago

@CopterFail -- no, it just means one can't use the bootupdater to run the dRonin bootloader which has a few different features. We don't usually recommend switching bootloaders unless one has to, anyway.

yds commented 7 years ago

@CopterFail no, not at all. dRonin works fine with the LibrePilot bu_revolution.opfw and even better with the bu_RevoMini_TauLabs_v85.opfw bootloaders.

mlyle commented 7 years ago

Sparky2 bu fails too, so this is a fundamental regression in f4 bu.

tracernz commented 7 years ago

@mlyle and myself independently bisected the original break to (although other stuff subsequently caused separate issues):

f51ae85dd4aa0507ead513d932946b331e09f3c9 is the first bad commit
commit f51ae85dd4aa0507ead513d932946b331e09f3c9
Author: Michael Lyle <mlyle@lyle.org>
Date:   Sun Aug 28 17:41:52 2016 -0700

    revolution/pios_flash: handle variable capacity flash

    Be willing to extend last partition.

:040000 040000 70b84d972e325e25c95f8572353456009a7be3e0 2fa064166d63c77198d9843831cc0cb888217fd8 M  flight
tracernz commented 7 years ago

In the bootloader we have:

<load initial SP into R4>
<load initial PC into R5>
<check SP and PC sanity>
msr MSP, r4  << that's setting main stack pointer to initial SP, yay
mov r3, r5 << moving initial PC into r3
ldmia.w sp!, {r4, r5, r6, lr}  << uh oh!! adds 16 to sp
bx  r3 << jump to initial PC
pop {r4, r5, r6, pc} << lol, never happens because firmware doesn't return

At the start of BU firmware we have:

Breakpoint 8, _main () at /home/mike/dev/dronin/flight/PiOS/STM32F4xx/startup.c:68
68      asm volatile ("mov r10, %0" : : "r" (&irq_stack[0]) : );
(gdb) info reg
r0             0x0  0
r1             0x0  0
r2             0x36fec9ff   922667519
r3             0x8020b91    134351761
r4             0x0  0
r5             0x8004b28    134236968
r6             0x10000  65536
r7             0x20000d90   536874384
r8             0x0  0
r9             0x0  0
r10            0x20000000   536870912
r11            0x0  0
r12            0x8002bb1    134228913
sp             0x20000410   0x20000410 <pios_flash_partition_table+16>
lr             0x0  0
pc             0x8020b90    0x8020b90 <_main>
xPSR           0x41000000   1090519040

And there we have it.. everything that uses stack (so memcpy initially), clobbers the start of the partition table.

tracernz commented 7 years ago

That code just looks like garbage. Why pop those regs off the stack, including lr, so if firmware returned it'd never get to the second pop/return anyway?

tracernz commented 7 years ago

Librepilot bootloader v6 messes up the stack pointer before branching to firmware too:

8002d8a:    f382 8808   msr MSP, r2
8002d8e:    e8bd 4010   ldmia.w sp!, {r4, lr}
8002d92:    4718        bx  r3
tracernz commented 7 years ago

And our 20160720.1:

 8001402:   f384 8808   msr MSP, r4
 8001406:   462b        mov r3, r5
 8001408:   e8bd 4070   ldmia.w sp!, {r4, r5, r6, lr}
 800140c:   4718        bx  r3
mlyle commented 7 years ago

That's a tail call optimization--- fix up the stack as if we're returning, leave the linkage register alone, so that when the called party returns they return to our caller. Can be fixed with a while (1); afterwards.

tracernz commented 7 years ago

__builtin_unreachable(); might do it too (and hopefully not even generate the code to pop callee regs)?

tracernz commented 7 years ago

Yup, it does; -e- add __set_msp https://godbolt.org/g/QMezph And arm-none-eabi-g++ with just Os actually generates okay code in all cases. :stuck_out_tongue: