MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.9k stars 499 forks source link

DietPi first install fails on ODROID-C4 (armbian-firmware fails unpack) #7261

Closed doqfgc closed 18 hours ago

doqfgc commented 3 weeks ago

Creating a bug report/issue

Required Information

Additional Information (if applicable)

Steps to reproduce

  1. From a fresh image, start the first install
  2. There is no step 2, the install fails on the very first apt upgrade.

Expected behaviour

Actual behaviour

Extra details

MichaIng commented 3 weeks ago

When I manually download them in browser, it works. But possible that one of the Cloudflare cache entries is broken, hence I cleared them for those two files. Please try again.

doqfgc commented 3 weeks ago

I'm still getting a fail, even from manually dropping packages in off-cache.

# dpkg -i ./armbian-firmware_24.11.0-trunk-dietpi1.deb 
(Reading database ... 18269 files and directories currently installed.)
Preparing to unpack .../armbian-firmware_24.11.0-trunk-dietpi1.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.8.0-trunk-dietpi2) ...
dpkg-deb (subprocess): decompressing archive './armbian-firmware_24.11.0-trunk-dietpi1.deb' (size=91604592) member 'data.tar': lzma error: compressed data is corrupt
dpkg-deb: error: <decompress> subprocess returned error exit status 2
dpkg: error processing archive ./armbian-firmware_24.11.0-trunk-dietpi1.deb (--install):
 cannot copy extracted data for './lib/firmware/brcm/brcmfmac4356-sdio-nanopi-m4v2.bin' to '/lib/firmware/brcm/brcmfmac4356-sdio-nanopi-m4v2.bin.dpkg-new': unexpected end of file or stream
Errors were encountered while processing:
 ./armbian-firmware_24.11.0-trunk-dietpi1.deb

Is it possible the package itself is broken?

MichaIng commented 3 weeks ago

It works fine here 🤔:

root@NanoPiR5S:~# cd /tmp
root@NanoPiR5S:/tmp# curl -o armbian-firmware.deb https://dietpi.com/apt/dists/all/odroidc4/binary-all/armbian-firmware_24.11.0-trunk-dietpi1.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 87.3M  100 87.3M    0     0  5163k      0  0:00:17  0:00:17 --:--:-- 5302k
2024-10-29 16:52:05 root@NanoPiR5S:/tmp# dpkg -i armbian-firmware.deb
(Reading database ... 19277 files and directories currently installed.)
Preparing to unpack armbian-firmware.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.11.0-trunk-dietpi1) ...
Setting up armbian-firmware (24.11.0-trunk-dietpi1) ...
root@NanoPiR5S:/tmp# apt download armbian-firmware
Get:1 https://dietpi.com/apt all/nanopir5s all armbian-firmware all 24.11.0-trunk-dietpi1 [91.6 MB]
Fetched 91.6 MB in 13s (6786 kB/s)
root@NanoPiR5S:/tmp# dpkg -i armbian-firmware_24.11.0-trunk-dietpi1_all.deb
(Reading database ... 19277 files and directories currently installed.)
Preparing to unpack armbian-firmware_24.11.0-trunk-dietpi1_all.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.11.0-trunk-dietpi1) ...
Setting up armbian-firmware (24.11.0-trunk-dietpi1) ...
root@NanoPiR5S:/tmp#

Different SBC but very same package (just symlinked server-side), and also using the same URL for Odroid C4 works.

doqfgc commented 3 weeks ago

Okay.

I swapped SD cards with a brand new unused SanDisk Ultra (same as my other working ODROIDs) to make sure that wasn't the culprit and the same thing happened: fail on armbian-firmware, fail on linux-image-current-meson64.

Doing a manual dpkg-deb extract to a temp folder results in extraction failure in different places each time, so now I believe the culprit may be a bad board (I doubt it's "bad card" twice in a row).

Investigation ongoing.

Joulinar commented 3 weeks ago

or is there something on the network that could do package inspection? Like a firewall?

MichaIng commented 3 weeks ago

Yes seems more like a network/download issue than a storage issue, based on these unexpected EOF messages. How exactly did you download the package? If like I did to /tmp, then it is a tmpfs/RAM disk anyway, not related to the SD card ... thought the extraction of course is.

doqfgc commented 3 weeks ago

The manual download was to a folder in /tmp, yes. The manual extraction also happened in a folder in /tmp.

I doubt it's firewall. I roll my own via OPNsense.

Just to check for sanity, I risked prod and did a manual extract from a fresh package download on one of my working ODROIDs and it completed without error. I also did dietpi-update on the same working ODROID and that also completed.

That rules out the package and the network, which leaves either hardware or the base image itself, but I doubt it's the base image.

MichaIng commented 3 weeks ago

Hmm, and wget/curl did not throw any error? Can you show metadata if the file, its size in particular?

Maybe it is some problem with the dpkg extractor then. But you said other packages upgraded fine?

doqfgc commented 3 weeks ago

Correct and correct. I'd even installed a package in the subshell (memtester) and that installed okay.

To be sure, I did an md5sum on both the independently obtained package and the package pulled from apt upgrade and they both matched with fcdebcaad0c70e0f0e2507ee4c337890.

To be surelysure, I had an old base image on hand (v8.23.3 on bookworm) and that also failed on initial setup on armbian-firmware. And also hard locked the system at linux-image-current-meson64.

doqfgc commented 3 weeks ago

Update: I connected it to a monitor and got a bugcheck installing armbian-firmware. It's definitely hardware.

MichaIng commented 3 weeks ago

md5sum returns fcdebcaad0c70e0f0e2507ee4c337890 here as well. Or more precisely sha256sum matches 5ef8c82df7222b2a792f5f366f2d2dfda7b047c868b101a60334da4f8fd00531, which again matches the checksum in https://dietpi.com/apt/dists/all/odroidc4/binary-all/Packages. However, APT also checks this checksum, otherwise denies the download ... ah or denies extracting, as it should not have a way to get the checksum without downloading the file. However, it should abort before attempting to extract the archive (passing things to dpkg).

What do you mean with "bugcheck"?

doqfgc commented 3 weeks ago

A kernel error. I tried to reproduce it but it didn't happen again; now just back to segfaulting on unpacking these two packages.

Furthermore, attempting to reboot after the failed package installation results in an unbootable system.

MichaIng commented 3 weeks ago

So maybe then it is indeed an issue with the SD card? Maybe another package upgrade related to the dpkg unpacker or one of its libraries damaged it, so that it in turn failed to unpack any other archive. Kernel errors in the middle of linux-image-current-meson64 installation can surely break system boot as well :( .

doqfgc commented 3 weeks ago

I doubt it, especially as it's happened to two different SD cards; one of which was previously brand new unused. I know that SD cards aren't the most reliable storage medium but to have the same issue pop up with the same two packages on different cards is very suspicious.

I did get the bugcheck to show up again though. Apologies for camera-pointed-at-screen syndrome here, I really don't want to transcribe the entire terminal.

Bugcheck image ![IMG_20241029_133256842](https://github.com/user-attachments/assets/aad0c5ef-d7ff-4fc0-94e9-dd98f2a39fee)
MichaIng commented 3 weeks ago

A kernel paging request failure. This can be either a kernel bug or RAM damage. I'll just test this kernel on Odroid N2+. Btw, as we had this on another device, does e.g. htop show the correct RAM size?

doqfgc commented 3 weeks ago

htop reports 3.69GB, same as on working ODROIDs.

MichaIng commented 3 weeks ago

I just tested the same kernel and firmware upgrade from same original versions on my Odroid N2+ 🤔.

There is a very similar report on a different SBC: #7257 So far I cannot imagine how it can be related with pretty different kernel, other than that it might be some bug in dpkg (or a particular version of it or such), but not sure whether buggy software can cause this kernel error, or whether the kernel or hardware must be buggy for it.

doqfgc commented 3 weeks ago

I'm going to let memtester run on all the free memory (everything but the first 100MB or so) just to see if it's bad RAM.

I don't understand the relation either, especially as I had just updated another ODROID-C4 to 9.8.0 in https://github.com/MichaIng/DietPi/issues/7261#issuecomment-2444836820 from 9.7.1 and had no issue.

I also doubt it's a dpkg bug as I also tried with the old base image and unless there was no changes to dpkg in a year and a half it's either not that or an undiscovered year and a half old regression.

MichaIng commented 3 weeks ago

Jep so far I also don't see the relation, but didn't want to leave it unmentioned, since it is the very same two packages failing to unpack with the exact same errors.

doqfgc commented 3 weeks ago

memtester passed the memory, so I'm probably good to dismiss that.

That leaves SD or packager bug.

In the interim I updated a third ODROID-C4 to 9.8.0 and as a second witness, that board also updated fine and without issue.

MichaIng commented 3 weeks ago

Btw, do new images boost and update fine on those other C4 that updated fine, in case you have an option to test that?`

And since smaller packages seem to upgrade fine, does reinstalling dpkg help and in case even solve the issue?

apt install --reinstall dpkg

I checked all related library packages, and none was recently upgraded, apart of libc6 in August. But I cannot imaging that libc6 itself can somehow cause an error like this. Very weird, with the other report, and as of the identical errors I do not really believe it is a coincidence, but maybe a faulty package version combination or so.

doqfgc commented 3 weeks ago

I haven't tested fresh images on my working C4s (they're in prod and I can't bring them down at this time).

Reinstalling dpkg does not help.

I have even tried a different SD card brand (Kingston, same brand as my prod C4s) to no avail.

MichaIng commented 3 weeks ago

So weird. I'll keep the current C4 (an NanoPi NEO Plus2) around for testing, as one of us has them as well, and can test next week, but will otherwise move new images in place now. Of course they will work, as they have latest kernel and firmware already, but would be interesting to see whether they run into the same error, when next kernel upgrade is ready, or when reinstalling any of the two packages.

doqfgc commented 3 weeks ago

Okay, while I had a break in peak I pulled down the noncritical working C4 to test. Same SD, same environment, same base image.

It worked without issue. No errors. It just worked.

It's gotta be faulty board at this point. I'll move forward with an RMA with my supplier.

MichaIng commented 2 weeks ago

Does not hurt to try. I am still checking with the other user with same error on NanoPi NEO Plus2. Like maybe there is a rare issue with the way our packages are packaged on the GitHub Actions Ubuntu runner or so: https://github.com/MichaIng/DietPi/issues/7257#issuecomment-2453200838

But other than you, he did not face kernel errors so far. So still some chance that it is extreme coincidence you face the same errors for different reasons.

doqfgc commented 1 week ago

After a bit of back and forth with my supplier, it turns out there may be a shadow second revision of the ODROID-C4 and that would be the unit suffering from the unpacking bug.

I received what they described as an "older revision" of the board and it works fine without issue.

Thus, my personal issue is solved, but this leaves questions unanswered, such as why a possible revision to a board would cause this sort of catastrophic failure.

More investigation may be required.

MichaIng commented 18 hours ago

I am still wondering about the concurrent identical case with that NanoPi NEO Plus2. I mean may really be all coincidence, but I am not 100% convinced. However, I am glad that your supplier sent you another revision which does not have the issue anymore.

I'll close this issue then, focusing on the other case.