frank-w / BPI-Router-Linux

Linux kernel 4.14+ for BPI-R2, 5.4+ for R64, 6.1+ for R2Pro and R3
Other
136 stars 47 forks source link

Local FS corruption in 5.4 #86

Closed johnx666 closed 3 years ago

johnx666 commented 3 years ago

Hi guys, anyone having FS issues when running 5.4.142 for more than a few days? was fine in 4.14.72 both SD card and SATA seems affected.

Errors like that: Sep 17 04:40:04 slackware kernel: [492735.703868] EXT4-fs error (device sda1): htree_dirblock_to_tree:1025: inode #505099: block 2600519: comm updatedb: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=2620482273, rec_len=51393, name_len=230, size=4096

frank-w commented 3 years ago

Had running 5.4 long time without issues,but currently i have 5.10 running (since 5.10.32). In both i do not see such errors. Maybe an recent commit in 5.4 make problems? Can you try 5.10 if issue persists? Have it running from emmc and have an sata ssd connected. Have now 5.10.60 running for 25 days and see no such error in dmesg. I use ext4 too on emmc and sata

johnx666 commented 3 years ago

Thanks for your feedback My thinking exactly with new commit possibly making something worse. I'll get 5.10.60 and re-try after I see the issue again. FS is ext4. In the beginning system was on SD but I noticed fs errors, thought SD is dying so moved it to SATA but no difference(new FS, run fsck -f every boot etc). Can also be HW issue since connected SATA drive uses external power supply. Will try the sata power on BPI-r2 to rule it out. Not sure how much current BPI-r2 can deliver there, the 12V trace looks very narrow:D

johnx666 commented 3 years ago

UPDATE There is no memory corruption in 4.14 hmmmm will check into that. maybe corrupted kernel image

OLD POST I think we can close this one - memory corruption. I'll post memory reservation for BPI using DTS to work around it.

Just for kicks: memtester version 4.5.1 (32-bit) Copyright (C) 2001-2020 Charles Cazabon. Licensed under the GNU General Public License version 2 (only).

pagesize is 4096 pagesizemask is 0xfffff000 want 1500MB (1572864000 bytes) got 1500MB (1572864000 bytes), trying mlock ...locked. Loop 1/1: Stuck Address : testing
FAILURE: 0xc67f64a9 != 0x922cfc52 at offset 0x2d07c620. FAILURE: 0xdfff3fbf != 0x329a8f59 at offset 0x2d07c624. FAILURE: 0xff379b33 != 0x07bd7e96 at offset 0x2d07c628. FAILURE: 0xf7ffcf8e != 0x0a555604 at offset 0x2d07c62c. FAILURE: 0x7ffbb7bd != 0x00000080 at offset 0x2d07c630. FAILURE: 0x5eb63c56 != 0x00000000 at offset 0x2d07c634. FAILURE: 0xcd680c63 != 0x00000000 at offset 0x2d07c638. FAILURE: 0xfb57cbab != 0x00000000 at offset 0x2d07c63c. FAILURE: 0xe31e2b64 != 0x00000000 at offset 0x2d07c640. FAILURE: 0x4dee6d6d != 0x00000000 at offset 0x2d07c644. FAILURE: 0x3f73a217 != 0x00000000 at offset 0x2d07c648. FAILURE: 0xff7f6bcd != 0x00000000 at offset 0x2d07c64c. FAILURE: 0xef9f2425 != 0x00000000 at offset 0x2d07c650. FAILURE: 0x7f6711ba != 0x00000000 at offset 0x2d07c654. FAILURE: 0xffaf9fb5 != 0x00000000 at offset 0x2d07c658. FAILURE: 0x2dff4049 != 0x00000000 at offset 0x2d07c65c. FAILURE: 0xf06f9084 != 0x00000000 at offset 0x2d07c660. FAILURE: 0x6bf7e44c != 0x00000000 at offset 0x2d07c664. FAILURE: 0xffff7f3c != 0x00000000 at offset 0x2d07c668. FAILURE: 0x37fde354 != 0x00000000 at offset 0x2d07c66c. FAILURE: 0x5fefb754 != 0x00000000 at offset 0x2d07c670. FAILURE: 0xffff4daf != 0xc0310000 at offset 0x2d07c674.

frank-w commented 3 years ago

You have defect ram? Why does it now affect 4.14?

How do you workaround it?

johnx666 commented 3 years ago

Upon many trial and error it seems to be a corruption in early RAM region depending on build different aspect of kernel is affected. I'm working on a workaround:) I'll update if I find any.

johnx666 commented 3 years ago

I didn't manage to get it resolved, seems like SoC is broken beyond economic repair. anyway if only memory is broken this is how to disable region of memory using DTS (other methods don't work - not supported on ARM). reserved-memory {

address-cells = <2>;

            #size-cells = <2>;
            ranges;
            consys-reserve-memory {
                    compatible = "mediatek,consys-reserve-memory";
                    no-map;
                    size = <0 0x100000>;
                    alignment = <0 0x100000>;
            };

only this is needed

            badram {
                    reg = <0 0x90000000 0 0x05000000>;
                    no-map;
            };

end

    };

where second number is memory start and last is how memory to block