BusPirate / Bus_Pirate

Community driven firmware and hardware for Bus Pirate version 3 and 4
630 stars 132 forks source link

Self-test hangs when firmware is built with anything than -O0 #11

Closed agatti closed 8 years ago

agatti commented 8 years ago

Migrated from #10, original description follows:

What does this warning mean? I am trying to build a firmware from the latest revision of this repository. It builds successfully with the changes I have shared through my pull requests. However:

  1. it tells me this warning - regardless on what optimization level I use
  2. if I ignore a warning and upload this firmware to my BPv4:

HiZ>~ Disconnect any devices Connect (ADC to +3.3V) Space to continue <--- it freezes here!

My build environment: Ubuntu Linux 16.04.1 x64, MPLAB X v3.40 and MPLAB XC v1.26 compiler with PRO mode enabled (60 days free evaluation license, so that I could enable "3" level optimization for speed) When I build a BPv4 bootloader it does not give me this warning, boots flawlessly and feels to be more stable than previous (you could download it here)

I suspect this problem is because of incorrect linker settings or caused by configuration of a linker script, some functions code is overlapping, but still could not figure out what stuff exactly is overlapping!!

agatti commented 8 years ago

This either requires a hardware debugger to figure out where the code jumps to and what's going on, or attach a UART to USB adapter to AUX1/AUX2 and log messages there. Currently I don't have either at hand, but help in taking memory dumps and stack traces is gladly welcome.

mikebdp2 commented 8 years ago

I just updated a local repo and re-tried 3rd level optimization build, although it does not produce a warning now (thanks to your linker script commit) - but this freeze problem still exists

While debugging with Pickit2 there is already 3.3V voltage coming from ICSP pins. That means: I cannot connect USB wire at the same time (because it would cause overvoltage) and cannot send a command to initiate a self-test. So I made some temporary modifications to the source code - **main.c** file: 1) added #include "selftest.h" 2) function Initialize(), line 191: comment out //usb_start(); 3) soon after Initialize() inside main() completes, right before do {...} while (usb_device_state < CONFIGURED_STATE); cycle I insert perform_selftest(false, true); Then I ran this modified program and noticed that BPv4 completed a selftest - not without errors though because: disabled USB, not connected ADC to 3.3V, etc. - but at least it doesn't freeze! Moreover, later I put another perform_selftest(false, true); after that and then it got through two self tests in a row - again without problem.

Also I notice that this freezing is related not only to self test: for example, if I press upper arrow keyboard key at HiZ> command prompt (think it sends "^[[A" character sequence) - it freezes too! Maybe this problem could be caused by UART receiving part, and it froze not after I pressed "Space to continue" (to begin a self test) but right at this moment. Could make it so that: to begin a self test it wouldn't be needed to press "space to continue", then rebuild and see what would happen (will try it soon)

Meanwhile, a small question for you about UART to USB adapter (just in cause I would need it): I have exactly the same adapter as this one but never used it yet. Could it serve for this purpose, and if yes, what pins I need to connect to AUX1/AUX2, what should be the proper connectivity to get these messages?

mikebdp2 commented 8 years ago

Yes! after I commented 95 line at selftest.c - //UART1RX(); - BPv4 successfully passed a self test (all tests are OK and LEDs are correct) and after that it froze at "Any key to exit" XD That's message BPMSG1250; and right after that - again UART1RX(). So its obvious where to dig now, but I am unsure about how to debug it further

mikebdp2 commented 8 years ago

Oh, it is possible to #include "base.h" and then use these functions as a sign of where it reaches:

BP_USBLED_OFF();
BP_MODELED_OFF();
// <--- a long cycle
BP_USBLED_ON();
// <--- some function
BP_MODELED_ON();

Good thing is that feedback - unlike printing messages - does not involve transmitting over UART

mikebdp2 commented 8 years ago

Using this LED approach I found the exact place where it freezes: cdc.c --> WaitOutReady() , line 237 --> while ((CDC_Outbdp->BDSTAT & UOWN)) What happens: 1) Inside selftest.c, line 95: after printing the "Space to continue" messages it instantly goes inside UART1RX(); <--- baseIO.c, line 477 2) This UART1RX() consists of just one function call - getc_cdc(); <--- cdc.c, line 370 3) Inside this function there is a call to another function - getda_cdc(); <--- cdc.c, line 260 4) It calls WaitOutReady(); at line 264 ---> goes to line 235 in the same file 5) Right there is a while cycle: while ((CDC_Outbdp->BDSTAT & UOWN)); From picusb.h, line 86: #define UOWN 0x80. So it is stuck there because UOWN flag at BDSTAT is always true... If we compare the disassembly of clean WaitOutReady(); function (without my temporary led inserts) in both 0 and 1 optimization levels, they seem to be quite similar:

Level 0:

0000d572 <_WaitOutReady>:
    d572:   00 00 fa        lnk       #0x0
    d574:   00 00 00        nop       

0000d576 <.L24>:
    d576:   b0 7f 81        mov.w     0x2ff6, w0
    d578:   10 40 90        mov.b     [w0+1], w0
    d57a:   00 04 e0        cp0.b     w0
    d57c:   fc ff 35        bra       LT, 0xd576 <.L24>
    d57e:   00 80 fa        ulnk      
    d580:   00 00 06        return  `

Level 1:

0000ab46 <_WaitOutReady>:
    ab46:   a0 7f 81        mov.w     0x2ff4, w0
    ab48:   10 40 90        mov.b     [w0+1], w0
    ab4a:   00 04 e0        cp0.b     w0
    ab4c:   01 00 3d        bra       GE, 0xab50 <.L26>

0000ab4e <.L27>:
    ab4e:   ff ff 37        bra       .

0000ab50 <.L26>:
    ab50:   00 00 06        return   

So I don't know why it doesn't work for Level 1. Please could you help, @agatti , do you have any ideas about this?

agatti commented 8 years ago

Thanks a lot for the investigation! I need to check with the opcodes description for manually decoding some instructions, but it seems like the compiler generated invalid code for level 1.

For level zero you see at 0xd57c that if the jump is taken it goes back to 0xd576 (meaning w0 is still 0), and if not it continues by returning the function.

For level 1, instead, it looks like that at 0xab4c if the branch is not taken it goes to 0xab4e - which looks like an infinite loop. For that I need to decode the opcode manually to confirm that.

agatti commented 8 years ago

Yep, looks like an infinite loop at 0xab4e. So, the opcode at 0xab4e is 0x37FFFF - which decodes as BRA (PC + 2) + (-1 * 2) -> BRA PC. I'll see if rewriting that function in proper assembly language can work, or if there are function attributes that do not force invalid code to be generated.

Still, @mikebdp2, thanks a ton for your investigative work!

agatti commented 8 years ago

So, both _WaitOutReady and _WaitInReady have the same invalid code. the 0xFF 0xFF 0x37 code sequence appears a few more times but looks like it is being used as a filler. I guess we should file a bug with Microchip, but we need to come up with a simplified test case for them to look at - if they actually do look at those things...

agatti commented 8 years ago

A fix has been added, 94c96d7ca5c691d008e5b37e4ecf1c5d5534e0bc - can you please try it out on your end? I works on my board but I'd like to hear back from you before closing this bug as fixed :)

mikebdp2 commented 8 years ago

Thank you very much for providing these new commits, @agatti ! However, I just tried the most recent revision and it seems that our problems are not solved yet:

/opt/microchip/xc16/v1.26/bin/bin/../bin/elf-ld: Link Error:
section .config_JTAGEN%34 [02abfe   -> 02abff  ] overlaps .config_JTAGEN%32 [02abfe   -> 02abff  ]
section .config_IESO%35 [02abfc   -> 02abfd  ] overlaps .config_IESO%33 [02abfc   -> 02abfd  ]
section .config_JTAGEN%36 [02abfe   -> 02abff  ] overlaps .config_JTAGEN%32 [02abfe   -> 02abff  ]
section .config_IESO%37 [02abfc   -> 02abfd  ] overlaps .config_IESO%33 [02abfc   -> 02abfd  ]
...
cdc.s:459: Error: symbol _WaitOutReadyCompilerFix is already defined
cdc.s:1111: Error: symbol _WaitOutReadyCompilerFix is already defined
cdc.s:1227: Error: symbol _WaitOutReadyCompilerFix is already defined
cdc.s:1349: Error: symbol _WaitOutReadyCompilerFix is already defined

Interesting is why these errors are only about WaitOutReadyCompilerFix and no errors about WaitInReadyCompilerFix... Probably if we would fix them then all the levels would have the same error about section overlaps

mikebdp2 commented 8 years ago

I just tested a firmware build at 0 level and it passes a self test! :) But this self test doesn't test JTAG and I'm sure there would be the problems because of section overlaps. Not sure if we should continue in this issue or create a new one

agatti commented 8 years ago

The JTAG errors you see are caused by the configuration bits being brought in twice, as per http://www.microchip.com/forums/m937316.aspx - which is something I'll fix right away. For the multiply-defined _WaitOutReadyCompilerFix error, I can add a simple workaround as well.

However I wonder how come I do not get those errors at any levels though, could it be because I am not using an evaluation key?

agatti commented 8 years ago

40ff85d836ec339c589e022a261f81eaa49b17ab and 3a0778bbf300d680061de1ee658231f389df4b83 should cater to both problems. I moved the #pragma bits back to main.c - my mistake - so there won't be any duplicated pragma configuration definitions. Then, I removed the extra labels in the assembly implementations of WaitInReady and WaitOutReady by using locally-defined labels ".something" rather than globally-defined "something:" ones. I just tested it here at levels 0 to 3 and flashed it on level 3 just to be sure.

Can you please give it a shot now? Thank you!

mikebdp2 commented 8 years ago

@agatti, it seems we almost fixed it: on 0, 1 and s levels - no warnings, no errors, everything seems completely fine there! :smile: But on level 2 and 3 there is still the same error. For example, level 2: cdc.s:455: Error: symbol `.loopOut' is already defined


_getda_cdc:
.LFB11:
    .loc 1 296 0
    .set ___PA___,0
    .loc 1 298 0
    mov #_CDCFunctionError,w0
    clr.b   [w0]
.LBB12:
.LBB13:
    .loc 1 245 0
; 245 "../dp_usb/cdc.c" 1
    .loopOut: <-- line 455
    mov.w _CDC_Outbdp, w0 
    cp0.b [++w0]          
    bra N, .loopOut       
_WaitOutReady:
.LFB7:
    .loc 1 234 0
    .set ___PA___,0
    .loc 1 245 0
; 245 "../dp_usb/cdc.c" 1
    .loopOut: <-- already defined here
    mov.w _CDC_Outbdp, w0 
    cp0.b [++w0]          
    bra N, .loopOut

I don't know why it is working for you :) Just tried installing MPLAB X together with XC16 compiler on another computer - with Fedora - and indeed I can reproduce this error on default configuration. So maybe you set up your tools in some special way that they don't produce this error, and if you revert your settings to default you'll be getting it as well. Or maybe you are using Windows and this is a platform-specific bug of compiler...

agatti commented 8 years ago

I think I figured it out. With all optimisation enabled (paid version) and levels 2 and 3, xc16 inlines the function call but keeping the label in, whilst with 0, 1, s, or without the PRO mode, it will simply leave it alone. I am running the non-evaluation, non-paid compiler, so that's why I can make it work.

I'm actually using the same settings that are checked in the MPLAB X project, and I'm on a Mac, btw. Anyway, let me see if there is a way to not inline that function...

agatti commented 8 years ago

Can you please try to change the function prototypes in cdc.h and cdc.c from: void WaitOutReady() to void __attribute__((noinline)) WaitOutReady(), and do the same for WaitInReady?

According to xc16's manual it should prevent the compiler from inline the function (and bring the label in all the time). If that is still an issue then I'll have to move those two functions in a separate .s file and at that point the compiler just cannot do that sort of operations any longer :)

mikebdp2 commented 8 years ago

@agatti, thank you for a great idea! Now it works OK on any optimization level :laughing: Funny thing is that, despite WaitOutReady and WaitInReady are almost the same, there are no problems with WaitInReady so I left it the same as before

agatti commented 8 years ago

Fixed with 2a3b853d2cb34744d2769bb8e4c621e99e9f058e - thanks a lot for your help!