MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.85k stars 495 forks source link

Increasingly frequently firstrun fails with Armbian mirrors down #6648

Open dirkhh opened 1 year ago

dirkhh commented 1 year ago

A couple of months ago this would happen "once in a while". At this point this happens "several times a day" for me. Most of the times simply retrying fixes things - but if you have things scripted as AUTO_SETUP_HEADLESS=1 and AUTO_SETUP_AUTOMATED=1 this usually leads to a failed build / failed install.

I'm not quite sure what a good solution would look like (besides asking the Armbian folks to get it together)... but is there a way that the automated setup process could be more resilient and not fall back to user interaction when one of these updates fails?

MichaIng commented 1 year ago

Yes, I recognise the same on our image builds. 20% or so fail on first attempt because of an Armbian mirror 404 or 5xx error. With last release, I added up to 3 attempts (2 retires) to our APT install/upgrade error-handler, if "E: Failed to fetch: .*armbian" is contained in the error message.

Can you see in your logs whether this kicked in, or whether the error message did not match in your case, or whether the 3 attempts were not enough? On our last image builds, indeed 3 attempts were not always enough, but it becomes ridiculous at some point, when the mirror director just redirects you to the same faulty mirror over and over again. I have no other good idea how to make it more resilient.

At some point we'll use our own APT repo. The Armbian kernel package builds can be quite easily automated with their build system. The only problem is that using the latest master build system version, and recent upstream Linux patch versions, means that we'd be regularly ahead of their repo. And this means that we'd run into issues, other Armbian uses did not face yet. I mean, even the broken HDMI and USB 3.0 issue with some RK3399 boards did not reach Armbian's bug tracker or repo yet, after several weeks in their repo, as far as I can see.

dirkhh commented 1 year ago

Yes, I see three attempts in the logs - all three times going to the same failing mirror. Which then forces the AUTO_SETUP_AUTOMATED to 0... which on a headless system that is designed to be managed via web UI is equivalent to a hard failure (the user doesn't even HAVE the password to log in with) - but a hard failure that they have no indication of except "well, it doesn't respond".

In another project of mine I tried to hack around this issue by manually picking different mirrors, instead of the mirror system always sending me back to the same broken mirror. Not sure if this is something we could do here in DietPi (I haven't found that part of the code, yet - so I'm not quite sure how we do this today).

On using our own APT repo. Yeah... being ahead of your upstream is rarely the answer... helping your upstream do better might be. I've tried to reach out to the Armbian folks on this (always tricky), and I've reached out to a friend of mine who is hosting a fairly powerful and fault tolerant mirror network to see if maybe he has ideas how to improve that...

dirkhh commented 1 year ago

I got a response on the Armbian forum - the mirror that has caused all the problems for me the last two days has been temporarily disabled. So fingers crossed we should see an improvement.

MichaIng commented 1 year ago

helping your upstream do better might be. I've tried to reach out to the Armbian folks on this (always tricky)

Yes, let's say our communication with Armbian is, mostly because of a single person there, difficult. If we had a solution to commit, that should be fine, but if we needed to start telling them that there is a problem, and it is me or another known DietPi team member, this won't end productively.

I just see you succeeded. Yes best is to leave "DietPi" out of the game. Indeed a long-term solution would be great, as this very same thing happens regularly, mirrors are disabled, re-enabled, by times there were only 2 mirrors overall, one extremely slow, one out of sync, then they are back added, and as regularly a faulty mirror is back there again. Some automated checks would make sense indeed, and not just checking one flag, but all lists of all suites and the existence of the package files they are pointing to. Selecting partners which provide servers with high availability, proper sync jobs and sufficient space with buffer is another option, but no idea how difficult this is.

I was involved in debugging an issue with the old Armbian mirror director, based on Python Flask, which after a while started to redirect to plain HTTP even that the original request was in HTTPS, causing often issues as some mirrors had an HSTS header without redirect set which made APT fail accordingly. This must have been some sort of caching at some point between the proxy Nginx, container network, Nginx withing the container and Flask backend, but before we could find it, Armbian switched to a completely different mirror director which solved the issue.

The thing with automatically serving another mirror in case of a failure is probably difficult:

Here the source code btw: https://github.com/armbian/armbian-router

dirkhh commented 1 year ago

As I mentioned, a good friend is running a massive mirror network and I'm talking to him to see if we can get Armbian mirrored that way. They are (among other things) mirroring the Linux kernel and several major distros - and are run by people who truly know what they are doing. Let's see if this goes anywhere. The problem of over-extended, under-staffed upstream projects isn't new. The problem of toxic people making things frustrating for everyone isn't, either. Ironically, there are several people over there that I think might fit the profile of the person you are referring to... 🤔

MichaIng commented 1 year ago

Would be very interesting how your friend is doing it. Might be interesting for us as well, though, at least for the start we'll just use Cloudflare (as we do for our website) to have packages cached on the probably largest and fastest CDN possible 😉. Only thing we need to take care is probably cleaning the cache whenever doing updates to the APT server.

Yeah, "under-staffed" is probably as well a reason for frustration and in turn causing the communication issues where were facing, especially when people are starting to rely on the financial returns (just a guess). Armbian went quite some steps to higher financial returns recently, with vendor premium memberships, restricted support forums etc. Good if that works out, and probably releases some stress that is now sometimes loaded on common users reporting bugs.