babbleberry / rpi4-osdev

Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4
https://www.rpi4os.com
Creative Commons Zero v1.0 Universal
3.37k stars 246 forks source link

Code stuck in initBricks (part6-breakout) #17

Closed novel233 closed 2 years ago

novel233 commented 2 years ago

During the learning process, I found that the code would get stuck in a specific position in the initBricks function. I found the location of the error, but I cannot understand the cause of the error.

objects[numobjs].width = brickwidth;
//The error is here
objects[numobjs].height = brickheight;

The code will get stuck here. But as long as you add other meaningful code between these two lines, it will run normally. For example, "wait_msec(1)" or other commands. Can you help answer this question? :D

babbleberry commented 2 years ago

Interesting! I think I saw similar behaviour during the development process.

Can you clarify for me - what is your development environment e.g. OS, (cross-)compiler version etc.?

I'll try and repeat the problem tonight.

These freaky issues popped up more than I would have hoped for, and can be the result of many things - bad compiler optimisation, stack overflows etc.

I'll try and be as helpful as I can :)

Thanks, Adam

babbleberry commented 2 years ago

If I could make a quick suggestion, try changing the -O2 to -O1 in the compiler flags in the Makefile. If it then works, the compiler optimisation is doing something funky!

novel233 commented 2 years ago

Good news! When I change o2 to o1, everything works fine. If you still need to understand my development environment, My OS is wsl2, The (cross-) compiler version is the latest "gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf.tar.xz" I don’t know how to thank you enough! \^o^/

babbleberry commented 2 years ago

Really glad to hear it! I still would love to investigate the actual cause, but I'm glad you're up and running :)

novel233 commented 2 years ago

I also want to understand the real reason, It's just that I'm still a beginner, and I may not be able to help you. Can you reproduce this bug now? What can i do for this?Do you need my assembly file?

babbleberry commented 2 years ago

As a "beginner", you've done well to identify and articulate the problem so accurately. It takes skill to pinpoint an issue like this, so don't underestimate your ability!

I've got everything I need from you, and can quickly spin up an environment that matches yours. Just one question: are you using Ubuntu in WSL?

All I'm short on is time! ;-) Let's see if I can spend an hour or two tonight!

novel233 commented 2 years ago

Yes, my OS is ubuntu-20.04

babbleberry commented 2 years ago

What's clear to me is that this is a compiler trying to over-optimise. See the Arm gcc output here using -O2 (it starts at the line where initBricks calls drawRect in the loop):

   81170: 1d fc ff 97   bl      0x801e4 <drawRect>
   81174: 40 03 40 b9   ldr     w0, [x26]
   81178: 84 03 40 f9   ldr     x4, [x28]
   8117c: 01 04 00 11   add     w1, w0, #1
   81180: 00 7c b7 9b   umull   x0, w0, w23
   81184: 82 00 00 8b   add     x2, x4, x0
   81188: 9b 68 20 b8   str     w27, [x4, x0]
   8118c: 41 03 00 b9   str     w1, [x26]
   81190: 53 d0 00 29   stp     w19, w20, [x2, #4]
   81194: 73 ea 02 11   add     w19, w19, #186
   81198: 56 c0 00 f8   stur    x22, [x2, #12]
   8119c: 5b 50 00 39   strb    w27, [x2, #20]
   811a0: 7f be 1e 71   cmp     w19, #1967
   811a4: a1 fd ff 54   b.ne    0x81158 <initBricks+0x58>
   811a8: 94 52 00 11   add     w20, w20, #20
   811ac: 18 13 00 91   add     x24, x24, #4
   811b0: 9f 2a 02 71   cmp     w20, #138
   811b4: 60 00 00 54   b.eq    0x811c0 <initBricks+0xc0>
   811b8: 15 03 40 b9   ldr     w21, [x24]
   811bc: e4 ff ff 17   b       0x8114c <initBricks+0x4c>
   811c0: f3 53 41 a9   ldp     x19, x20, [sp, #16]
   811c4: f5 5b 42 a9   ldp     x21, x22, [sp, #32]
   811c8: f7 63 43 a9   ldp     x23, x24, [sp, #48]
   811cc: f9 6b 44 a9   ldp     x25, x26, [sp, #64]
   811d0: fb 73 45 a9   ldp     x27, x28, [sp, #80]
   811d4: fd 7b c6 a8   ldp     x29, x30, [sp], #96
   811d8: c0 03 5f d6   ret

Here's the same snippet in the output from clang (which does not reproduce the error you are seeing - it runs just fine):

   81698: d3 fa ff 97   bl      0x801e4 <drawRect>
   8169c: 28 57 44 f9   ldr     x8, [x25, #2216]
   816a0: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816a4: 29 7d 1b 9b   mul     x9, x9, x27
   816a8: 18 69 29 b8   str     w24, [x8, x9]
   816ac: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816b0: 29 21 1b 9b   madd    x9, x9, x27, x8
   816b4: 36 05 00 b9   str     w22, [x9, #4]
   816b8: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816bc: 29 21 1b 9b   madd    x9, x9, x27, x8
   816c0: 33 09 00 b9   str     w19, [x9, #8]
   816c4: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816c8: 29 21 1b 9b   madd    x9, x9, x27, x8
   816cc: 3c 0d 00 b9   str     w28, [x9, #12]
   816d0: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816d4: 29 21 1b 9b   madd    x9, x9, x27, x8
   816d8: 37 11 00 b9   str     w23, [x9, #16]
   816dc: 49 d3 48 b9   ldr     w9, [x26, #2256]
   816e0: 28 21 1b 9b   madd    x8, x9, x27, x8
   816e4: 29 05 00 11   add     w9, w9, #1
   816e8: df d6 1b 71   cmp     w22, #1781
   816ec: 18 51 00 39   strb    w24, [x8, #20]
   816f0: d6 ea 02 11   add     w22, w22, #186
   816f4: 49 d3 08 b9   str     w9, [x26, #2256]
   816f8: 41 fc ff 54   b.ne    0x81680 <initBricks+0x58>
   816fc: 73 52 00 11   add     w19, w19, #20
   81700: e8 07 40 f9   ldr     x8, [sp, #8]
   81704: 08 05 00 91   add     x8, x8, #1
   81708: 1f 15 00 f1   cmp     x8, #5
   8170c: e1 fa ff 54   b.ne    0x81668 <initBricks+0x40>
   81710: f4 4f 46 a9   ldp     x20, x19, [sp, #96]
   81714: f6 57 45 a9   ldp     x22, x21, [sp, #80]
   81718: f8 5f 44 a9   ldp     x24, x23, [sp, #64]
   8171c: fa 67 43 a9   ldp     x26, x25, [sp, #48]
   81720: fc 6f 42 a9   ldp     x28, x27, [sp, #32]
   81724: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
   81728: ff c3 01 91   add     sp, sp, #112
   8172c: c0 03 5f d6   ret

Frankly, working out what's going wrong in the gcc example is a little above my pay grade!

If you figure it out, I'd love to know though... I know my code isn't great, but the compiler shouldn't trip over these simple lines.

Note how much shorter the Arm gcc example is though. Interesting, eh?

babbleberry commented 2 years ago

For comparison, I disassembled the Arm gcc output when -O1 is specified in Makefile (less compiler optimisation - and the thing that worked for you):

   80f10:       97fffcb7        bl      801ec <drawRect>
   80f14:       b9400280        ldr     w0, [x20]
   80f18:       f945bea1        ldr     x1, [x21, #2936]
   80f1c:       8b000400        add     x0, x0, x0, lsl #1
   80f20:       d37df000        lsl     x0, x0, #3
   80f24:       b8206836        str     w22, [x1, x0]
   80f28:       b9400282        ldr     w2, [x20]
   80f2c:       d37f7c40        ubfiz   x0, x2, #1, #32
   80f30:       8b224000        add     x0, x0, w2, uxtw
   80f34:       d37df001        lsl     x1, x0, #3
   80f38:       f945bea0        ldr     x0, [x21, #2936]
   80f3c:       8b010000        add     x0, x0, x1
   80f40:       b9000413        str     w19, [x0, #4]
   80f44:       f945bea0        ldr     x0, [x21, #2936]
   80f48:       8b010000        add     x0, x0, x1
   80f4c:       b9000817        str     w23, [x0, #8]
   80f50:       b9000c1b        str     w27, [x0, #12]
   80f54:       b900101a        str     w26, [x0, #16]
   80f58:       39005016        strb    w22, [x0, #20]
   80f5c:       11000442        add     w2, w2, #0x1
   80f60:       b9000282        str     w2, [x20]
   80f64:       1102ea73        add     w19, w19, #0xba
   80f68:       711ebe7f        cmp     w19, #0x7af
   80f6c:       54fffc61        b.ne    80ef8 <initBricks+0x48>  // b.any
   80f70:       110052f7        add     w23, w23, #0x14
   80f74:       91001318        add     x24, x24, #0x4
   80f78:       11005339        add     w25, w25, #0x14
   80f7c:       71022aff        cmp     w23, #0x8a
   80f80:       54fffba1        b.ne    80ef4 <initBricks+0x44>  // b.any
   80f84:       a94153f3        ldp     x19, x20, [sp, #16]
   80f88:       a9425bf5        ldp     x21, x22, [sp, #32]
   80f8c:       a94363f7        ldp     x23, x24, [sp, #48]
   80f90:       a9446bf9        ldp     x25, x26, [sp, #64]
   80f94:       f9402bfb        ldr     x27, [sp, #80]
   80f98:       a8c67bfd        ldp     x29, x30, [sp], #96
   80f9c:       d65f03c0        ret

Interesting how it looks a lot more like the Clang output (if you squint hard enough!).

babbleberry commented 2 years ago

One thing I wonder is whether the compiler optimisation has somehow required the FPU, which we don't enable in our boot code? You might try adding the GCC flag -mcpu=cortex-a72+nofp to your Makefile and trying again at -O2. I've done a build and the assembly code differs. I haven't run it up yet to see if it works though...

UPDATE: tried the build and that hasn't solved the issue. Will keep thinking on this one! In the meantime, I've added something to the docs - thanks for pointing this out.

piersfinlayson commented 3 months ago

Nice tutorial.

The solution is to add the compiler flag -mstrict-align.

I've now lost count of the number of times I've seen GCC produce code with unaligned access, when the processor doesn't support it (or isn't in a mode where it supports it). The BCM2711, ARMv8 can support unaligned access, but it requires a particular register bit to be correctly set (UNALIGNED_TRP I believe).

Adding this compiler flag results in a working part 6 demo (at least for me), as well as allowing my own code to work :-).

For other targets (e.g. when generating 32-bit ARM object files), the required setting is -mno-unaligned-access.

babbleberry commented 3 months ago

Awesome - I'm going to test it myself and then update the Makefiles across the board! What an odd quirk...

babbleberry commented 2 months ago

Tested and updated! Thanks again - that's a great fix :)