SensorsIot / IOTstack

Docker stack for getting started on IOT on the Raspberry PI
GNU General Public License v3.0
1.42k stars 303 forks source link

2023-10-24 replacement for install.sh script - master branch #740

Closed Paraphraser closed 6 months ago

Paraphraser commented 8 months ago

One of the most surprising things about #729 (at least to me) was how it implied that someone had actually used the install.sh script.

I say this with the utmost respect for the original author and subsequent contributors (who did all the work while I just sat on my hands) but, after studying the existing script, I reached the conclusion that the wisest course of action was to start from scratch. There were seemed to be so many issues in the existing install.sh that it was difficult to know how much could be salvaged:

What this replacement script attempts to do is:

  1. Use absolute paths throughout so there is no ambiguity about where files/folders are located. Although this replacement script defaults to ~/IOTstack the default can be overridden by prepending the correct path, as in:

    $ IOTSTACK="$HOME/TestIOTstack" ./install.sh
  2. Use a rational method of version checking (specifically dpkg --compare-versions) which can actually handle the cases of one, two, three or more SEMVER fields correctly and which isn't fazed by weird suffixes.

  3. Install the minimum set of dependencies needed by IOTstack. This is on the assumption that install.sh may be being used on either a green-fields system or an existing system. PiBuilder excels at green-fields but could prove problematic were it to be used on an already-highly-customised working system. My own view is that a clean slate plus PiBuilder produces a better outcome but there is definitely a case to be made for supporting adding IOTstack to an existing system.

  4. Install docker and docker-compose-plugin correctly so both docker-compose (with hyphen) and docker compose (without hyphen) are the same binary and produce the same result. One side effect of correct installation is that both docker and docker-compose-plugin are updated by apt.

  5. Adds the user to the required groups (once!).

  6. Installs Python dependencies in a Bookworm-friendly manner. And, yes, I do realise using --break-system-packages is suboptimal but that's something that can be addressed by people with Python expertise (and, given no PRs have been submitted to attend to this, those people are probably a bit thin on the ground). In my view a dash of sub-optimal is better than not working at all on Bookworm.

  7. Sets Raspberry Pi cmdline options on a per-option basis, rather than assuming the options will always appear in the same order.

This replacement script is also specifically designed to be run multiple times without doing any harm. It is also designed so that it can be run safely after a PiBuilder run. This serves four purposes:

  1. If the script is being run on an existing system where, say, an obsolete version of docker is installed, the script will explain how to remove docker, after which the script should be re-run. This basic approach of check, explain how to recover, re-try, continues until the script completes normally.

  2. Ultimately, my intention is to propose another PR to remove all "installation" and version-checking tasks from the menu. To that end, this replacement script writes its exit status to ~/IOTstack/.new_install. Eventually, I see the menu behaving like this:

    • if .new_install is not present or is present and contains a non-zero exit code, the menu will prompt the user to run the (replacement) install.sh.
  3. As a guided-migration tool. Once the menu has been changed as above, the mostly likely situation a user will encounter after the subsequent pull from GitHub is the menu recommending that (the new) install.sh should be run. If the script finds the user's environment is obsolete (eg ancient pinned versions of docker or docker-compose) the script will guide the user through the upgrade. Rinse, repeat and eventually the script will complete normally, after which the user's system will be fully up-to-date and the menu will just get on with the job of being the menu.

  4. As a general-purpose "fixup" tool. Anyone reporting problems with IOTstack which implicate anything this replacement script is designed to handle can be instructed to run install.sh and see what happens.

This replacement script updates minimum version numbers to something more recent:

I have tested this script on:

  1. Raspberry Pi 4B running Bullseye and Bookworm.

  2. Debian on Proxmox, running Bullseye and Bookworm.

  3. Multiple runs of this script on each of the above (to ensure second or subsequent runs do no harm).

  4. After running PiBuilder on clean installs of all four test platforms (ie 1+2 above), also to ensure a run does no harm.

  5. After running the existing install.sh on all four test platforms, to ensure any damage (eg pinned obsolete versions) is discovered and reported, and that by following the repair instructions and re-running the new install.sh ultimately gives the platform a clean bill of health.

Documentation will be added to #737 shortly.

Noschvie commented 8 months ago

Great work, thanks, will help with tests this weekend.

Paraphraser commented 8 months ago

If you find any issues in your own testing, please report them and I will do my best to fix them quickly.

Noschvie commented 8 months ago

Hi Phill @Paraphraser for testing I want to add a Nexcloud container to an existing IOTstack (Node-Red, Mosquitto, Portainer) and get the error below

Error running PreBuildHook on 'nextcloud'
Traceback (most recent call last):
  File "./scripts/buildstack_menu.py", line 460, in runPrebuildHook
    exec(code, execGlobals, execLocals)                         
  File "./.templates//nextcloud/build.py", line 429, in <module>
    main()                                                      
  File "./.templates//nextcloud/build.py", line 419, in main    
    eval(toRun)()                                               
  File "./.templates//nextcloud/build.py", line 105, in preBuild
    os.makedirs(serviceVolume, exist_ok=True)
  File "/usr/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: './volumes/nextcloud'
Press Enter to continue...

I don't know this error is related to. I assume this PR isn't merged. I have to switch my IOTstack to your branch, isn't it?

Paraphraser commented 8 months ago

I have no idea. Those "build.py" scripts are a black box.

There's a complete working docker-compose.yml here. You probably won't need the network bits at the top but if you just copy/paste the two service definitions and change the passwords (two of them have to be the same) you should be up and running in a trice.

Paraphraser commented 8 months ago

After a bit more digging around, I have been able to replicate the problem.

If we assume "green fields" such as IOTstack just cloned, the volumes folder will not exist. If you run the menu, choose NextCloud, hit the right-arrow to set the password options (which is what causes build.py to run), and then let the menu generate your compose file, a side-effect of build.py is the equivalent of:

$ mkdir -p ./volumes/nextcloud/html

in which case volumes, nextcloud and html are created and owned by $USER.

When you "up" the stack, both docker-compose and NextCloud seem to leave that mostly alone, although NextCloud does change the ownership on html to be www-data:www-data.

If, after having done all that, you do nothing else except run the menu, put the cursor on NextCloud, hit right-arrow and choose an option that implies a password change, build.py will try to do:

$ chmod -R 0770 ./volumes/nextcloud/html

which will spit out a ton of permission errors for that path and everything inside it which is owned by www-data.

Now I'll move the goal-posts slightly. If I:

  1. Nuke both the volumes folder and the compose file;
  2. Run the menu;
  3. Choose NextCloud but do not hit right-arrow (which means build.py does not run, and the compose file gets the placeholder defaults in the password fields); and
  4. Generate the compose file,

then, at that point, no part of ./volumes/nextcloud/html will exist.

If I "up" the stack, what docker-compose does by default is the equivalent of:

$ sudo mkdir -p ./volumes/nextcloud/html

in which case all three folders in the path are created and owned by root.

If I then go back into the menu, put the cursor on NextCloud, hit right-arrow and do something with the passwords, that's when build.sh attempts to do:

$ mkdir -p ./volumes/nextcloud/html

which goes splat with the error you reported, because the nextcloud folder is owned by root:

PermissionError: [Errno 13] Permission denied: './volumes/nextcloud'

Do you reckon this explanation fits the pattern in terms of your knowledge of the order in which you did things on your system, or do you think there might still be more to it?

The thing is, absolutely none of this mkdir and chmod stuff is necessary. docker-compose does the right thing, and the container does the right thing on both first launch and subsequent launches. All build.sh is actually achieving is to get in the way and confuse the issue.

It's on my to-do list to come up with a better mechanism for placeholder replacement. None of it will involve futzing about in volumes (I reckon that should be considered private to docker-compose and the containers).

Anyway, bottom line, this is a build.py problem, not a problem with install.sh.

Noschvie commented 8 months ago

Hi Phill, a first test to choose NextCloud, but without hit the right-arrow to set the password options, leads to the san error message. I assume that Node-Red and Mosquitto are "part of the stack" but not touched during this run. Will add NextCloud as suggested in your previous post. Thanks again.

Slyke commented 6 months ago

I believe in earlier versions of NextCloud it was required to update the permissions and ownership so that it had write access, but it's possible it isn't needed anymore.

Paraphraser commented 6 months ago

I believe in earlier versions of NextCloud it was required to update the permissions and ownership so that it had write access, but it's possible it isn't needed anymore.

Out of the box NextCloud works fine "from scratch". I mean that in the sense of doing a copy/paste of the service definition from the IOTstack template. No permission issues.