Closed LizardLad closed 6 years ago
Thank you for your report.
I'm not sure though, I understand your issue. The line referenced does nothing more than reads a 32bit integer value into a variable. With MMU turned off, I can't see any reason to hang. If the problem was reading the MBR, the magic bytes check in line 81 and line 86 would have failed prior that. Both outputs an error message. Now if the problem is reading the volume boot record, then your MBR partitioning table must be corrupt, but in that case there's an error message.
Please provide more information so that I can reproduce your error. Most notably please provide a hexdump of your MBR. Check it with fdisk too, what does it say? Is the first partition an LBA FAT one? What is it's starting sector number? If you print that out with uart_hex(partitionlba), do you see the same value? (The directory tutorial prints that out. Is directory listing tutorial working on your SD card btw?)
I don't know how to provide a hex dump of the MBR. I used gparted to create an MBR I then created a FAT16 LBA partition with the partition number 1 and because I never get past that line it is impossible for me to run uart_hex(partitionlba) I also don't have a uart and so I have been printing things out to the screen instead of using the uart. I will try the directory tutorial also and neither of the checks before the assignment failed.
@bztsrc I just modded the directory tutorial to output everything to the screen and the same line is still causing the program to hang. The MBR disk identifier is -308972349.
Please read again what I've wrote. How did you get that disk identifier? Not from the directory tutorial output that's for sure. And it is irrelevant, I was asking for the starting sector number. You can use uart_dump(&_end) before that line to get the live MBR, and copy'n'paste minicom's output (or type every single byte of the dump from the screen). About dumping on your PC: dd if=(yourdevice) of=/dev/stdout bs=512 count=1 | hexdump -C. It's important to provide both dumps (the one with uart_dump() and the one with hexdump) so that I can compare them. For the partition table, use fdisk (yourdevice) and in the menu type 'p' (as in print table), then copy'n'paste the output.
Here is the dump from my laptop: 00000000 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 |................| 00000010 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 |...|.........!..| 00000020 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 |....8.u........u| 00000030 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b |.........|...t..| 00000040 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 |L.....|.........| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001b0 00 00 00 00 00 00 00 00 c3 74 95 ed 00 00 00 00 |.........t......| 000001c0 01 01 0e 3f e0 ff 00 08 00 00 00 00 7d 00 00 00 |...?........}...| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 1+0 records in 1+0 records out 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 512 bytes copied, 0.00209244 s, 245 kB/s 00000200 However I have been unable to get the dump from the Pi I will get back to you when I have it. I will also provide the partition table.
Thanks! At a first glance it seems perfectly valid.
Can you confirm that the directory tutorial prints the disk identifier but not the starting sector? You don't have to provide the live dump if you confirm that it is exactly, 100% the same as the one you've just posted (I'm assuming some kind of memory corruption, that's why it is important to compare). Are you sure the code you're using to print on screen does not clobber memory region &_end - &_end+512? Second, please provide the sector at 0x800 too (add skip=$[0x800*512] to dd command) just to make sure the partitioning table is correct about that.
Don't you worry, we'll sort this out! bzt
I am using your code to print to screen. I have used your uart code and replaced the uart_send() with lfb_print() however when printing to the screen the dump is all on a single line and there are characters overlapping others on screen once I get that fixed I can provide you with the dump.
Oh boy, you have messed up something really badly. The print function handles newlines in lfb.c:127 for sure. As your problem is definitely with your own code and not with the tutorials, I'm going to close this case. Regardless I'll try to help you to solve your issue once you have the info. Also, try to use a statically allocated memory instead of &_end, that should solve your memory corruption.
I have fixed my print code for dumping memory however I still am unable to get past the one line. Here is the dump from my pi: 20180331_182937.pdf
If you can't read that I can type it up and if you like I can provide you with my code for memory dumping on screen.
Here is the sector at 0x800 from my laptop: 00000000 6f 00 76 00 69 00 73 00 69 00 6f 00 6e 00 61 00 |o.v.i.s.i.o.n.a.| 00000010 6c 00 31 00 39 00 39 00 38 00 4b 00 54 00 33 00 |l.1.9.9.8.K.T.3.| 00000020 30 00 02 00 14 00 37 00 50 00 72 00 6f 00 76 00 |0.....7.P.r.o.v.| 00000030 69 00 73 00 69 00 6f 00 6e 00 61 00 6c 00 31 00 |i.s.i.o.n.a.l.1.| 00000040 39 00 39 00 38 00 4b 00 58 00 33 00 31 00 02 00 |9.9.8.K.X.3.1...| 00000050 14 00 38 00 50 00 72 00 6f 00 76 00 69 00 73 00 |..8.P.r.o.v.i.s.| 00000060 69 00 6f 00 6e 00 61 00 6c 00 31 00 39 00 39 00 |i.o.n.a.l.1.9.9.| 00000070 38 00 4b 00 59 00 33 00 32 00 02 00 14 00 39 00 |8.K.Y.3.2.....9.| 00000080 50 00 72 00 6f 00 76 00 69 00 73 00 69 00 6f 00 |P.r.o.v.i.s.i.o.| 00000090 6e 00 61 00 6c 00 31 00 39 00 39 00 38 00 4b 00 |n.a.l.1.9.9.8.K.| 000000a0 4a 00 33 00 33 00 00 00 e6 a2 00 00 e7 a2 00 00 |J.3.3...........| 000000b0 eb 00 00 00 e8 a2 00 00 e9 a2 00 00 ea a2 00 00 |................| 000000c0 eb a2 00 00 ec a2 00 00 ed a2 00 00 cc 01 00 00 |................| 000000d0 6f 00 00 00 02 00 14 00 30 00 50 00 72 00 6f 00 |o.......0.P.r.o.| 000000e0 76 00 69 00 73 00 69 00 6f 00 6e 00 61 00 6c 00 |v.i.s.i.o.n.a.l.| 000000f0 31 00 39 00 39 00 38 00 4b 00 45 00 33 00 37 00 |1.9.9.8.K.E.3.7.| 00000100 02 00 14 00 31 00 50 00 72 00 6f 00 76 00 69 00 |....1.P.r.o.v.i.| 00000110 73 00 69 00 6f 00 6e 00 61 00 6c 00 31 00 39 00 |s.i.o.n.a.l.1.9.| 00000120 39 00 38 00 4b 00 41 00 34 00 33 00 02 00 14 00 |9.8.K.A.4.3.....| 00000130 32 00 50 00 72 00 6f 00 76 00 69 00 73 00 69 00 |2.P.r.o.v.i.s.i.| 00000140 6f 00 6e 00 61 00 6c 00 31 00 39 00 39 00 38 00 |o.n.a.l.1.9.9.8.| 00000150 4b 00 4c 00 34 00 34 00 02 00 14 00 33 00 50 00 |K.L.4.4.....3.P.| 00000160 72 00 6f 00 76 00 69 00 73 00 69 00 6f 00 6e 00 |r.o.v.i.s.i.o.n.| 00000170 61 00 6c 00 31 00 39 00 39 00 38 00 4b 00 51 00 |a.l.1.9.9.8.K.Q.| 00000180 34 00 35 00 02 00 14 00 34 00 50 00 72 00 6f 00 |4.5.....4.P.r.o.| 00000190 76 00 69 00 73 00 69 00 6f 00 6e 00 61 00 6c 00 |v.i.s.i.o.n.a.l.| 000001a0 31 00 39 00 39 00 38 00 4b 00 5a 00 34 00 35 00 |1.9.9.8.K.Z.4.5.| 000001b0 02 00 14 00 35 00 50 00 72 00 6f 00 76 00 69 00 |....5.P.r.o.v.i.| 000001c0 73 00 69 00 6f 00 6e 00 61 00 6c 00 31 00 39 00 |s.i.o.n.a.l.1.9.| 000001d0 39 00 38 00 4b 00 48 00 35 00 34 00 02 00 14 00 |9.8.K.H.5.4.....| 000001e0 36 00 50 00 72 00 6f 00 76 00 69 00 73 00 69 00 |6.P.r.o.v.i.s.i.| 000001f0 6f 00 6e 00 61 00 6c 00 31 00 39 00 39 00 38 00 |o.n.a.l.1.9.9.8.|
@bztsrc I have narrowed down the hang to the raspberry pi being unable to dereference a unsigned int
I have written it over three lines they are as follows
unsigned long temp_long = (unsigned long)(&_end+0x1C6); //This line succeeds
unsigned int temp_partitionlba = (unsigned int )temp_long; //So does this one
partitionlba = temp_partitionlba; //but not this
I cannot think as to why this may happen.
I have also tried not using global variables however this has no effect.
Well, that's definitely not a FAT partition's VBR record. I suggest to download raspbian lite image, mount it, delete the unecessary files and work from there.
About the memory reference:
(unsigned long)&_end = gives a 64 bit memory address of the master boot record in the memory.
((unsigned long)&_end+0x1C6) = adds an offset which will then be an address into the first partition (starting sector to be precise)
(unsigned int*)((unsigned long)&_end+0x1C6) = adding a pointer cast means that at that address is a 32 bit integer value
((unsigned int)((unsigned long)&_end+0x1C6)) = finally, * will dereference that pointer, reading the integer value at that address.
You can try commenting out that line and use "partitionlba=0x800", but I have a feeling it won't fix your code at all. Using a bad partitioning table is a lot more pressing issue.
Cheers, bzt
@bztsrc I did change the compiler that I use so i am using aarch64-linux-gnu-gcc instead of using aarch64-elf-gcc because aarch64-linux-gnu-gcc was provided by the fedora repositories. If you don't believe that to be the issue would you me to send the code I use to print stuff to screen?
Well, since you have provided the dump, I suppose you have fixed the print pretty well. My suggestions:
@bztsrc Are you testing on QEMU or on a pi because this hasn't helped either. Can I send you all my code and see if you can run it on your machine?
Have you read this yet?
I've tested the tutorials on qemu as well as on a real machine.
I'm afraid I can't help you with that, you have to debug your own code. I suggest to buy an USB/serial cable (it's cheap) and try to use vanilla tutorial sources first, before you modify them.
The tutorial runs on qemu however it doesn't run on the real hardware i suspect I am undervolting the pi becuase raspian is downclocking the pi and still runs however I don't yet know how to do that so I will buy a new power supply and get back to you.
I'm not sure what you mean, running the bare metal tutorials means you don't run raspbian at all. You should have replaced the linux kernel with the kernel from the tutorial. As such, no downclocking takes place.
I tried raspbian because I was wondering if raspbian would work or not because I thought my pi was faulty. It turned out I either need to underclock the pi on boot or get a better power supply because I was not providing the pi with enough power. I just did it to see if my pi was faulty. I was replacing the linux kernel with the bare metal program before that.
Are you planning in the future to show us how to underclock the cpu and if you weren't would you please teach us as running the CPU at 1GHz is unnecessary?
Yeah maybe, good idea. In the meantime take a look how the clock rate is set for the uart0 in my tutorials, and read https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface#clocks (you'll have to set the clock id #3, called ARM, to underclock the cpu). You can also play with setting the voltage level with this mailbox interface, just be careful.
I have still been unable to get it to work on real hardware it still hangs on the same line. Does it make any difference that my sdcard is 32GB?
Well all that I can do for you is give a step by step guide to locate your problem, but after that you'll have to fix that problem yourself.
step 1: Use dd to put rasbian-lite image on your card. Can you boot it? If not, choose another card. If yes, then it does not matter your card is 32G.
step 2: If you can boot with rasbian, then change kernel8.img to a vanilla bare metal app. The binaries in my repo were tested and known to be working (use the font tutorial for example). Does it boot after that? If not, then you're doing something wrong and messing up your card. Repeat this step until you can boot it. Mount the first partition of your card, and remove every kernel*.img files, then copy kernel8.img in the root directory. Make sure you don't touch bootcode.bin, start.elf and fixup.dat, and that config.txt is empty (all lines commented out or simply delete the file). Unmount (wait until it's synced properly).
step 3: Finally put your kernel8.img on the card. Can you boot it? If the first 2 steps worked (should), but not this one, then you'll know the problem is in your code for sure, and not in your environment. You can also rule out faulty RPi.
step 4: Debug your code in qemu. Generate an assembly reference for your code and compare it to qemu's "-d in_asm" output. Once your code is running perfectly in qemu, repeat step 3. https://www.systutorials.com/240/generate-a-mixed-source-and-assembly-listing-using-gcc/
Good luck!
"the whole kernel hangs when the line partitionlba=((unsigned int)((unsigned long)&_end+0x1C6));"
Now I know, it's caused by a gcc 7.3.0 optimisation bug. I've provided a workaround. Just for the records, accessing bpb->nr and bpb->spc also affected, and shifting and masking does not work any more.
I am running gcc 7.2* and shifting and masking works. Your solution on the pi forums works as you said I was just shifting the wrong way.
Yeah, but it won't work with 7.3.0. That's why I'm suggesting now another workaround. Take a look at fat.c, this one works with 7.2.0 and 7.3.0 as well. I like to avoid compiler version specific solutions (I did not know until yesterday that shifting and masking was one). if you ever pick up the project again (let's say a year from now) those solutions could give you some sleepless nights.
I have localised the bug to the fat_getpartition() function the whole kernel hangs when the line partitionlba=*((unsigned int*)((unsigned long)&_end+0x1C6)); is reached I have tried a few things to try and fix it however all of it has been unsuccessful.