ericpaulbishop / gargoyle

Gargoyle Router Management Utility
http://www.gargoyle-router.com
469 stars 221 forks source link

Attempt to Preserve Settings #446

Open nworbnhoj opened 8 years ago

nworbnhoj commented 8 years ago

The "Attempt to Preserve settings" function is causing confusion for users and beta testers and has the potential to brick routers when mistakenly applied. Ultimately the confusion and mistakes will loose users, testers and confidence in Gargoyle.

The "Attempt to Preserve settings" is definitely valued by users, but is too blunt an instrument. Is there an easy way to improve the function so that developers can identify small chunks of the settings to remain unpreserved?

nworbnhoj commented 8 years ago

I have researched this issue a little and have a rough plan to go forward. I would value any feedback or insights at this point .....

BACKGROUND The "Attempt to Preserve Settings" on upgrade, and Gargoyle backup are stand-alone functions (I assumed they would be integrated or at least related). The Gargoyle function do_upgrade.sh calls on openwrt function sysupgrade.sh which has a mechanism to preserve various files. The Gargoyle function create_backup.sh has an independent hard-coded list of files to backup.

There are two related issues. Firstly, that preserve-settings-on-upgrade can cause routers to brick when an old config file is incompatible with the new version. Secondly, that restore-backup can similarly cause routers to brick (see issue #82).

APPROACH It seems sensible to integrate the two functions somewhat (preserve-settings-on-upgrade & restore-backup) so that both issues can be resolved (and maintained) in a single location.

It seems that there is a "default" list of files to backup/restore/preserve and a few exceptions for each upgrade that should NOT be restored/preserved because they are incompatible (brick)

It seems that there should be an preserve_exceptions file in each Gargoyle release, containing a list of files NOT to be upgraded for each Gargoyle version. For example: 1.6.0 /etc/config/fileA 1.6.2 1.8.0 /etc/config/fileB 1.8.0 /etc/config/fileC 1.8.0 /etc/config/fileD 1.8.1 1.9.0 /etc/config/fileE

It seems that each Gargoyle backup should be "stamped" with the Gargoyle version that generated the backup.

With these things in place, is would seem (relatively) straight forward during restore/preserve to parse the preserve_exceptions file from the current-version to the target-version and remove the exceptions from the default list of files to backup.

QUESTIONS Does this approach seem sensible/doable? Is this approach sufficiently simple and general? What have I missed?

lantis1008 commented 8 years ago

I propose a simpler version.

Backup still works the same. Add a "Selective restore" function. User uploads the conventional backup tar, we unzip it into ram, and list all the files inside. The user checks and unchecks files that they want to include exclude.

It would make sense in this to default select DHCP and ethers for the user, as well as QOS configs etc.

The onus is still on the user to figure out what they should and shouldn't include, but they can always enquire on the forum. Attempt to preserve settings should be left as is.

ericpaulbishop commented 8 years ago

I agree with Lantis above.

In an absolutely perfect world the solution nworbnhoj proposes would be the right one, but this would be a world where we had lots of development/testing resources. This COULD be done, it's possible, but getting it right would be hard, and much worse, it wouldn't be a one time thing. Every time we update the system, we would have to update the migration scripts to exclude what couldn't be migrated safely. Maintaining that system would be a nightmare. Not impossible, just a huge, huge pain. Also, the consequences for getting it wrong would be severe: if there was a bug we would be giving users a false sense of security that the upgrade would go smoothly.

The selective restore with a list of checkboxes gives the user more control, and thus it's easier to blame them if they don't know what they are doing, but make it possible to preserve the most important files for those who do know what they are doing. This system should DEFINITELY come with a warning: be careful, this can brick your router.

ericpaulbishop commented 8 years ago

Although... one thing that might make sense: Have a list somewhere of generally incompatible version upgrades, e.g. for when we switch to a different OpenWRT branch, and block-off the preserve settings checkbox if they upgrade between incompatible versions. This is sort of the opposite of what the initial issue is, preventing users from shooting themselves in the foot vs. giving them an opportunity to preserve certain files but not all of them which gives them a BIGGER opportunity to shoot themselves in the foot... but if we're looking at the upgrade system that's something to consider too.

nworbnhoj commented 8 years ago

Some of these comments make me a little sad.

I think that Gargoyle is a really good bit of gear because it fills a very common household need AND it is OpenSource. Gargoyle has the potential to become broadly known and used far beyond geeks and brave wanna-be geeks. Gargoyle COULD BE the household router software of choice. BUT Gargoyle has to be easier to use and bullet-proof. We OpenSource geeks need to step-up to the mark and prove that OpenSource is both better AND trustworthy. If we fail in this - then we will all slip into the expensive, insecure, closed-source quagmire that serves corporates, governments and legals. End of rant.

I think that we developers need to take responsibility for the integrity of the restore/preserve process because we are best positioned to do so. The approach that I described above will be a little fiddly to implement initially, but the ongoing maintenance should be easy enough. If a developer invalidates a config file then they simply add a single line to the preserve_exceptions file for the new Gargoyle version. The preserve_exceptions file is a cumulative list of config files that have been invalidated from one version to the next.

Ultimately I think Gargoyle needs to go much further than this. Gargoyle should update itself automagically with a signed image from a secure server on a regular basis and (of course) preserve settings.

ericpaulbishop commented 8 years ago

I agree with you that we need to take responsibility for the integrity of the restore/preserve process.. I think my comment regarding blaming users if they screw up didn't quite come across as intended.

Let's start from that point of agreement -- that we as developers need to take responsibility for the upgrade process. Given that this is our responsibility my TOP priority is to prevent users from shooting themselves in the foot, and bricking their router when they upgrade. However it happens, that is a disaster, and that means we have failed.

I also agree that ideally we could engineer a seamless and automatic upgrade experience. It is certainly possible, given enough resources. However, our capacity for destruction goes up in direct proportion to how automated this is. If the process is totally automatic, that means one mistake could brick thousands or tens of thousands of routers which would be crazy, insanely bad.

So.. instead of saying "no, let's not even try", which may be way too pessimistic, it's true.. let me turn it back to you (or anyone else who would care to respond): How can we move in this direction while minimizing the risk that one mistake is going to cause widespread destruction?

Maybe the real question is whether we can implement some automated testing suite and make sure that gets run and passed before releasing a build. Given that tests are dependent on the network environment of the router, this may be quite tricky, and that's why I haven't tried to write much in the way of automated tests thus far. However, that may be the real solution to this issue. If there's an automated testing suite that it gets run through before release, then we can be more confident that things aren't going to blow up.

lantis1008 commented 8 years ago

Good discussion here. My 2cents on auto upgrading. No. Or at the very least, opt in/out.

I used to hate when my ISP would take remote control and push updates to the modem. In one case, they broke my network because I was using a non default subnet (which was a supported option. No trickery).

ericpaulbishop commented 8 years ago

Lantis: I absolutely agree that if it's implemented it should be opt-in, though if -- and it's a BIG if -- we can do it and do it well it would be worth setting as an option on the firstboot screen, to encourage people to select that option.

However, it's very very hard to do well. It's certainly not something that can be done soon, but that doesn't necessarily mean it's not possible and it's not a worthy goal.

The more I think about it, the more I think that the key to doing this right is having a testing suite so that we can verify that everything is working properly and there won't be any problems. Rather than focusing on how to implement the upgrade, the first step in doing this is figuring out how to properly implement a test suite.

nworbnhoj commented 8 years ago

Great discussion :-) And quickly outgrowing the topic - very great. Could I suggest that we should Host this larger discussion more formally amongst Gargoyle developers and construct a Vision for Gargoyle (e.g. "to be the OpenSource home router software of choice for non-technical users" or some such). Then we will be in a position to imagine what Gargoyle needs to look like to play such a role (e.g. auto updates, config wizard, modern GUI, etc) and a path to getting there (eg, 10 active devs, auto-builds, test-regime etc). I don't mean to be corporate - but setting a vision can be a powerful thing for a volunteer group.

More specifically to the discussion about preserving settings and testing regimes: I think we should consider approaching the problem in a crowd way rather than a corporate centralized command and control way. We all know how well stackoverflow.com works by using the community to vote up the best answer and the role of individual reputation. What if we had a crowd of Gargoyle testers who's individual reputation depended on the reliability of their approval. What if Gargoyle users could choose to automatically upgrade when the depth of crowd testing exceeds a given threshold? Perhaps such an infrastructure is already out there?

ericpaulbishop commented 8 years ago

This is seems to be two separate points: (1) General vision/direction of Gargoyle development and (2) the best way to handle testing.

Let's stay on topic and address (2): While I think crowd-sourcing is great, I'm not willing to trust it completely, as it's easy to miss something. Also, this assumes that when we want to do a release there will be enough trustworthy volunteers willing to test. If we're going to try to go for reliable upgrades we really need an a way to automate testing, at least testing for the most common cases and for truly catastrophic failures. Wireless testing in particular will be tricky, but if we want to focus with basic wired-only testing to start with we may be able to build an x86 target that will work well in a VM and setup some automated tests on that. Also, if we "brick" a VM there's no major loss and recovering it isn't an issue.

Now, on to point (1). If I were to sum up the general goal of Gargoyle it would probably be along the lines of: "Open Source software with advanced tools for monitoring, securing and controlling your home network that is even easier to use than any default router firmware."

That's a bit of a mouthful and comes dangerously close to being a run-on sentence, I know. The key is that I want to offer advanced, sophisticated features while keeping it as easy to use as possible.

Defining desired number of developers seems silly for two reasons. First, it doesn't matter how many there are, it only matters what the quality of the developers are, and the total quality of contributions. Second, how many quality contributions do we want? MORE! It doesn't matter how many we get, that's always the answer for every open source project across the board.

What does, perhaps make sense is enumerating new desired features. Let me enumerate three really huge features I would like to see, but each one is a lot of work. Really I've had two in mind for a while, but I'm going to add the test suite+better upgrades with a smooth,well-tested upgrade path for preserving settings to make three:

(1) Create an automated Test Suite for Gargoyle, and use this to implement and test smooth upgrades, preserving settings between major future versions (2) Full IPv6 support. (3) Implement an optional Captive portal. If/when active we can match users not just by their MAC or IP, but by a username, which they must submit along with their password before being granted internet access. This is the ultimate way of controlling individuals on a network, by username vs device.

Those (well 2 & 3 really) are the really big, ambitious features I've been meaning to work on for a while, though have ended up mostly working on addressing small deficiencies, bug-fixes, and upgrading the base distribution.

nworbnhoj commented 8 years ago

I have roughed out a basic GitHub wiki home page as a basis for planning Gargoyle development effort.

It would be good to flesh out the steps involved in each of @ericpaulbishop 3 major development areas

nworbnhoj commented 8 years ago

Question @ericpaulbishop: Do you personally wish to implement these three large major development or are you looking for help with them? If you are looking for help then what role do you wish to play? Architect? Code Signoff?

ericpaulbishop commented 8 years ago

I don't have to be the one to implement them, but I will (as always) get final say as to when something is ready to merge. Anyone who wants to take a crack at them is more than welcome, and very encouraged. The only catch is these are all hard, ambitious problems. I hope to find time to start work on IPv6 & captive portal soon, though I've been hoping to find time for that for a while.

These are large enough that it may make sense that when work on them starts, it will make sense to start a separate collaborative branch specifically for that feature that multiple people can contribute to.

The key with both IPv6 & captive portal is that both need to fully integrate into what is there. Both of these will require extensive work on the iptables kernel modules. I am happy to provide descriptions and guidance if anyone wants to do that before I get to it... but this is ugly C code, just to warn you.

eric commented 8 years ago

Wrong eric.

nworbnhoj commented 8 years ago

So @ericpaulbishop if you were to setup a captive-portal branch (and merge the master in every now and again) @lantis1008 and I could start poking around at it.