Closed novel233 closed 2 years ago
Interesting! I think I saw similar behaviour during the development process.
Can you clarify for me - what is your development environment e.g. OS, (cross-)compiler version etc.?
I'll try and repeat the problem tonight.
These freaky issues popped up more than I would have hoped for, and can be the result of many things - bad compiler optimisation, stack overflows etc.
I'll try and be as helpful as I can :)
Thanks, Adam
If I could make a quick suggestion, try changing the -O2
to -O1
in the compiler flags in the Makefile. If it then works, the compiler optimisation is doing something funky!
Good news! When I change o2 to o1, everything works fine. If you still need to understand my development environment, My OS is wsl2, The (cross-) compiler version is the latest "gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf.tar.xz" I don’t know how to thank you enough! \^o^/
Really glad to hear it! I still would love to investigate the actual cause, but I'm glad you're up and running :)
I also want to understand the real reason, It's just that I'm still a beginner, and I may not be able to help you. Can you reproduce this bug now? What can i do for this?Do you need my assembly file?
As a "beginner", you've done well to identify and articulate the problem so accurately. It takes skill to pinpoint an issue like this, so don't underestimate your ability!
I've got everything I need from you, and can quickly spin up an environment that matches yours. Just one question: are you using Ubuntu in WSL?
All I'm short on is time! ;-) Let's see if I can spend an hour or two tonight!
Yes, my OS is ubuntu-20.04
What's clear to me is that this is a compiler trying to over-optimise. See the Arm gcc output here using -O2
(it starts at the line where initBricks
calls drawRect
in the loop):
81170: 1d fc ff 97 bl 0x801e4 <drawRect>
81174: 40 03 40 b9 ldr w0, [x26]
81178: 84 03 40 f9 ldr x4, [x28]
8117c: 01 04 00 11 add w1, w0, #1
81180: 00 7c b7 9b umull x0, w0, w23
81184: 82 00 00 8b add x2, x4, x0
81188: 9b 68 20 b8 str w27, [x4, x0]
8118c: 41 03 00 b9 str w1, [x26]
81190: 53 d0 00 29 stp w19, w20, [x2, #4]
81194: 73 ea 02 11 add w19, w19, #186
81198: 56 c0 00 f8 stur x22, [x2, #12]
8119c: 5b 50 00 39 strb w27, [x2, #20]
811a0: 7f be 1e 71 cmp w19, #1967
811a4: a1 fd ff 54 b.ne 0x81158 <initBricks+0x58>
811a8: 94 52 00 11 add w20, w20, #20
811ac: 18 13 00 91 add x24, x24, #4
811b0: 9f 2a 02 71 cmp w20, #138
811b4: 60 00 00 54 b.eq 0x811c0 <initBricks+0xc0>
811b8: 15 03 40 b9 ldr w21, [x24]
811bc: e4 ff ff 17 b 0x8114c <initBricks+0x4c>
811c0: f3 53 41 a9 ldp x19, x20, [sp, #16]
811c4: f5 5b 42 a9 ldp x21, x22, [sp, #32]
811c8: f7 63 43 a9 ldp x23, x24, [sp, #48]
811cc: f9 6b 44 a9 ldp x25, x26, [sp, #64]
811d0: fb 73 45 a9 ldp x27, x28, [sp, #80]
811d4: fd 7b c6 a8 ldp x29, x30, [sp], #96
811d8: c0 03 5f d6 ret
Here's the same snippet in the output from clang (which does not reproduce the error you are seeing - it runs just fine):
81698: d3 fa ff 97 bl 0x801e4 <drawRect>
8169c: 28 57 44 f9 ldr x8, [x25, #2216]
816a0: 49 d3 48 b9 ldr w9, [x26, #2256]
816a4: 29 7d 1b 9b mul x9, x9, x27
816a8: 18 69 29 b8 str w24, [x8, x9]
816ac: 49 d3 48 b9 ldr w9, [x26, #2256]
816b0: 29 21 1b 9b madd x9, x9, x27, x8
816b4: 36 05 00 b9 str w22, [x9, #4]
816b8: 49 d3 48 b9 ldr w9, [x26, #2256]
816bc: 29 21 1b 9b madd x9, x9, x27, x8
816c0: 33 09 00 b9 str w19, [x9, #8]
816c4: 49 d3 48 b9 ldr w9, [x26, #2256]
816c8: 29 21 1b 9b madd x9, x9, x27, x8
816cc: 3c 0d 00 b9 str w28, [x9, #12]
816d0: 49 d3 48 b9 ldr w9, [x26, #2256]
816d4: 29 21 1b 9b madd x9, x9, x27, x8
816d8: 37 11 00 b9 str w23, [x9, #16]
816dc: 49 d3 48 b9 ldr w9, [x26, #2256]
816e0: 28 21 1b 9b madd x8, x9, x27, x8
816e4: 29 05 00 11 add w9, w9, #1
816e8: df d6 1b 71 cmp w22, #1781
816ec: 18 51 00 39 strb w24, [x8, #20]
816f0: d6 ea 02 11 add w22, w22, #186
816f4: 49 d3 08 b9 str w9, [x26, #2256]
816f8: 41 fc ff 54 b.ne 0x81680 <initBricks+0x58>
816fc: 73 52 00 11 add w19, w19, #20
81700: e8 07 40 f9 ldr x8, [sp, #8]
81704: 08 05 00 91 add x8, x8, #1
81708: 1f 15 00 f1 cmp x8, #5
8170c: e1 fa ff 54 b.ne 0x81668 <initBricks+0x40>
81710: f4 4f 46 a9 ldp x20, x19, [sp, #96]
81714: f6 57 45 a9 ldp x22, x21, [sp, #80]
81718: f8 5f 44 a9 ldp x24, x23, [sp, #64]
8171c: fa 67 43 a9 ldp x26, x25, [sp, #48]
81720: fc 6f 42 a9 ldp x28, x27, [sp, #32]
81724: fd 7b 41 a9 ldp x29, x30, [sp, #16]
81728: ff c3 01 91 add sp, sp, #112
8172c: c0 03 5f d6 ret
Frankly, working out what's going wrong in the gcc example is a little above my pay grade!
If you figure it out, I'd love to know though... I know my code isn't great, but the compiler shouldn't trip over these simple lines.
Note how much shorter the Arm gcc example is though. Interesting, eh?
For comparison, I disassembled the Arm gcc output when -O1
is specified in Makefile (less compiler optimisation - and the thing that worked for you):
80f10: 97fffcb7 bl 801ec <drawRect>
80f14: b9400280 ldr w0, [x20]
80f18: f945bea1 ldr x1, [x21, #2936]
80f1c: 8b000400 add x0, x0, x0, lsl #1
80f20: d37df000 lsl x0, x0, #3
80f24: b8206836 str w22, [x1, x0]
80f28: b9400282 ldr w2, [x20]
80f2c: d37f7c40 ubfiz x0, x2, #1, #32
80f30: 8b224000 add x0, x0, w2, uxtw
80f34: d37df001 lsl x1, x0, #3
80f38: f945bea0 ldr x0, [x21, #2936]
80f3c: 8b010000 add x0, x0, x1
80f40: b9000413 str w19, [x0, #4]
80f44: f945bea0 ldr x0, [x21, #2936]
80f48: 8b010000 add x0, x0, x1
80f4c: b9000817 str w23, [x0, #8]
80f50: b9000c1b str w27, [x0, #12]
80f54: b900101a str w26, [x0, #16]
80f58: 39005016 strb w22, [x0, #20]
80f5c: 11000442 add w2, w2, #0x1
80f60: b9000282 str w2, [x20]
80f64: 1102ea73 add w19, w19, #0xba
80f68: 711ebe7f cmp w19, #0x7af
80f6c: 54fffc61 b.ne 80ef8 <initBricks+0x48> // b.any
80f70: 110052f7 add w23, w23, #0x14
80f74: 91001318 add x24, x24, #0x4
80f78: 11005339 add w25, w25, #0x14
80f7c: 71022aff cmp w23, #0x8a
80f80: 54fffba1 b.ne 80ef4 <initBricks+0x44> // b.any
80f84: a94153f3 ldp x19, x20, [sp, #16]
80f88: a9425bf5 ldp x21, x22, [sp, #32]
80f8c: a94363f7 ldp x23, x24, [sp, #48]
80f90: a9446bf9 ldp x25, x26, [sp, #64]
80f94: f9402bfb ldr x27, [sp, #80]
80f98: a8c67bfd ldp x29, x30, [sp], #96
80f9c: d65f03c0 ret
Interesting how it looks a lot more like the Clang output (if you squint hard enough!).
One thing I wonder is whether the compiler optimisation has somehow required the FPU, which we don't enable in our boot code? You might try adding the GCC flag -mcpu=cortex-a72+nofp
to your Makefile and trying again at -O2
. I've done a build and the assembly code differs. I haven't run it up yet to see if it works though...
UPDATE: tried the build and that hasn't solved the issue. Will keep thinking on this one! In the meantime, I've added something to the docs - thanks for pointing this out.
Nice tutorial.
The solution is to add the compiler flag -mstrict-align.
I've now lost count of the number of times I've seen GCC produce code with unaligned access, when the processor doesn't support it (or isn't in a mode where it supports it). The BCM2711, ARMv8 can support unaligned access, but it requires a particular register bit to be correctly set (UNALIGNED_TRP I believe).
Adding this compiler flag results in a working part 6 demo (at least for me), as well as allowing my own code to work :-).
For other targets (e.g. when generating 32-bit ARM object files), the required setting is -mno-unaligned-access.
Awesome - I'm going to test it myself and then update the Makefiles across the board! What an odd quirk...
Tested and updated! Thanks again - that's a great fix :)
During the learning process, I found that the code would get stuck in a specific position in the initBricks function. I found the location of the error, but I cannot understand the cause of the error.
The code will get stuck here. But as long as you add other meaningful code between these two lines, it will run normally. For example, "wait_msec(1)" or other commands. Can you help answer this question? :D