Closed CaptClaude closed 4 years ago
Hi,
I think the problem might turn out to be a first-run issue, in the sense of not having done the first run steps properly (which I realise can be a loaded statement but please bear with me).
Things like this have come up a couple of times in other posts so I will flesh out what I think is best practice (others may disagree, of course).
I know the README.md says something slightly different but the way I have always done a clean install is:
$ cd ~
$ git clone https://github.com/SensorsIot/IOTstack.git IOTstack
$ cd ~/IOTstack
$ ./menu.sh
then, I choose the first option to "Install Docker".
That ends with the requirement for a reboot.
After the Pi comes back:
$ cd ~/IOTstack
$ ./menu.sh
and I choose the second option "Build Stack".
I work through to the end of that process, and the result is docker-compose.yml. Then:
$ cd ~/IOTstack
$ docker-compose up -d
Once before a problem of the kind you seem to be having turned out to be that the person had installed docker via sudo apt install docker
(or words to that effect).
If you take a peek inside menu.sh and search for "install" you'll wind up at:
if command_exists docker; then
echo "docker already installed"
else
echo "Install Docker"
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
fi
if command_exists docker-compose; then
echo "docker-compose already installed"
else
echo "Install docker-compose"
sudo apt install -y docker-compose
fi
While apt
is used to install docker-compose there's a curl piped to shell to install docker. Last time, the person reporting the problem did another clean install and let menu.sh do the work, and the problem was cured.
If that doesn't work and if this were my problem, I'd step through those installation commands by hand to see where things break.
Note:
command_exists
is a function defined inside menu.sh which expands tocommand -v docker
Another possibility might be the "Got permission denied" and other permission issues. That might be traced back to the sudo usermod
in menu.sh.
While on this topic of things I do that might not be "strictly according to Hoyle", I do NOT let menu.sh be responsible for project updates. Before running menu.sh I always:
$ cd ~/IOTstack
$ git pull origin master
and then I study the output to see what has changed so that I have some idea of what might need to be re-built to pick up, say, a template change.
Taking a slightly longer way around has worked for me on both "gcgarner" and "sensorsIOT" versions of this project. The habits are now so deeply-ingrained that shortcuts like cd ~/IOTstack && bash ./menu.sh
feel wrong.
Yes, come the day I change my shell to something other than
bash
I may need to up my game a bit. I know. I know!
I hope something in this helps you.
I truly appreciate your taking time to document the process and I am absolutely sure that I was having that particular problem because I failed to follow the process (I installed docker first manually instead of letting the script do it for me, correctly).
However, I am stymied by dropped connections to the Pi that cause the setup process to fail.
I was able to successfully let menu.sh install docker and reboot.
Ran script to build stack. that ran to an apparently successful conclusion.
Ran run docker-compose up -d
and it chugged along until I got
packet_write_wait: Connection to 192.168.0.20 port 22: Broken pipe
Logged back in, re-ran build the stack and it chugged along until ERROR: Service 'nodered' failed to build
. Ran the docker command Delete all stopped containers and docker volumes
. Rebuilt the stack again and it completed w/o error. Re-ran docker-compose up -d
and it started by building node-red. It ran through a bunch of other pulls until it got to telegraf when the connection was dropped. Note that I was not sitting watching it at the time, because these things take a fair amount of time. So I logged back in again, rebuilt the stack, ran docker-compose up -d
yet again and got a completely new error: ERROR: readlink /var/lib/docker/overlay2: invalid argument
. Rebuilt the stack again, same error. Deleted the stopped containers again, rebuilt the stack and again, same error. Clearly I broke something. docker ps
tells me docker is running with no containers.
I am gutted. What on earth am I doing wrong?
[edit]
Is there a way to stop docker, uninstall it and re-install it? There must be, and although I first worked with UNIX in 1975 (University), I am not so deep into it that I sweat bash commands upon demand.
The reason it's best to let IOTStack install docker is because it also adds the current user to be able to execute docker commands without needing to sudo.
Broken pipe is a bit weird though, that usually indicates the Pi rebooted, or that the sshd service suddenly died, or something like that.
Is your Pi getting adequate power? It's possible it's getting brownouts and these issues are surfacing. If you have it plugged into a monitor, watch for a rainbow square or lightning bolt on the screen (even in cli mode it will do this) while these commands are executing. Low voltage warnings also appear in the logs. I have seen this type of weird behavior with a Pi browning out, even so far as skipping lines in bash scripts randomly (this took a lot of hair pulling to figure out since I was ssh'ed in).
I would probably also try just reimaging the card. It may be quicker than troubleshooting the cause. It's hard to know what state the system is in if you installed, then uninstalled docker, then installed it again with the script, plus any changes made in attempting to troubleshoot/fix the issue.
@Slyke you may be on to something (and thanks for respondig).
I am on holiday in a rental house in the mountains of Arkansas (USA) and have been powering the Pi off of a thick USB battery pack which may not be suitable.
Changing directions, I once again re-imaged the SD and go through the process with a proper RPi power supply (because you always take an extra Pi power supply on holiday, right?). As of this moment, I am chugging through the docker-compose up -d
part and shall see what happens.
As we say, "Film at 11".
[edit]
Sigh. Power might have been an issue before but now I have a Python build error that seems strangely familiar:
Step 1/5 : FROM python:3
3: Pulling from library/python
33f1e205e6f9: Pull complete
83963556ddab: Pull complete
e52e6e35f16e: Pull complete
cf172e11e265: Pull complete
ef3d123dcffe: Pull complete
ff1b28ed1a23: Pull complete
bbffd6608c5d: Pull complete
d97ae8cfec53: Pull complete
004d4db5a630: Pull complete
Digest: sha256:6fcd27ebeb1a5b4fd289ff15cb666e619c060c7b76f5a1b1a99d7cddb6de337a
Status: Downloaded newer image for python:3
---> 4eebd1ea17da
Step 2/5 : WORKDIR /usr/src/app
---> Running in 3d71e16e5c61
Removing intermediate container 3d71e16e5c61
---> cf73feb5a925
Step 3/5 : COPY requirements.txt ./
ERROR: Service 'python' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder721293025/requirements.txt: no such file or directory
docker ps
shows (yet again) docker is running but no containers.
Rebuild the stack again but leave out Python 3 and everything comes up:
pi@old-trout:~/IOTstack $ docker-compose up -d
Creating portainer ... done
Creating mosquitto ... done
Creating grafana ... done
Creating pihole ... done
Creating nodered ... done
Creating influxdb ... done
Creating telegraf ... done
pi@old-trout:~/IOTstack $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f761014f48d6 telegraf "/entrypoint.sh tele…" 4 minutes ago Up 4 minutes 8092/udp, 8125/udp, 8094/tcp telegraf
f6221865b00e influxdb:latest "/entrypoint.sh infl…" 5 minutes ago Up 4 minutes 0.0.0.0:2003->2003/tcp, 0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp influxdb
35e57bdc0597 grafana/grafana "/run.sh" 5 minutes ago Up 4 minutes 0.0.0.0:3000->3000/tcp grafana
0bcc8d3a8148 iotstack_nodered "npm start --cache /…" 5 minutes ago Up 4 minutes (healthy) 0.0.0.0:1880->1880/tcp nodered
07ffe42c8b78 portainer/portainer "/portainer" 5 minutes ago Up 4 minutes 0.0.0.0:9000->9000/tcp portainer
8b4e3851923f pihole/pihole:latest "/s6-init" 5 minutes ago Up 4 minutes (healthy) 0.0.0.0:53->53/udp, 0.0.0.0:53->53/tcp, 0.0.0.0:67->67/udp, 443/tcp, 0.0.0.0:8089->80/tcp pihole
b12d2e7b9976 eclipse-mosquitto "/docker-entrypoint.…" 5 minutes ago Up 4 minutes 0.0.0.0:1883->1883/tcp, 0.0.0.0:9001->9001/tcp mosquitto
I think I can claim conditional success and will test things when the sun comes up again. The question comes to mind: Do I need a Python 3 container? In the last year, all of my Python has been 3 and I am sure that I will need it, but do I need a container with it? Thanks!
@Slyke responded while I was preparing this answer so some of this might be out-of-order.
To answer your last question first, the times I have wanted to get myself a clean slate I have either:
Blown it away in situ:
$ sudo apt purge docker-compose
$ sudo apt purge docker
$ cd ~
$ sudo rm -rf "IOTstack"
always reminding myself to pause before hitting return on that last command. I typically enclose the target(s) of an rm -rf
in quotes in case I've been dumb enough to have a space-introducing keyboard-stutter while typing the directory name. If I'm being really careful, I'll do an ls -R "target"
first to preview what is going to get clobbered.
It might not surprise you to learn that the reason for all this caution is an experience 27 years ago where I managed to kill an entire Novell Netware system by being, ahem, insufficiently cautious. Adding insult to injury was Novell being smart enough to ship Netware without a recursive delete. I had created the app myself to plug that gap. Talk about psyche-scarring!
I did a reasonable number of complete RPi rebuilds when I was getting started but that was more because I wanted to be sure that some new problem B wasn't the product of some prior cock-up A that had not been completely fixed. I was also making sure that the process in the above gist worked.
I've done several "blow it away in situ" and that seems to work. My rationale for -compose first, then docker is that that is the inverse order of installation but I don't know if it is necessary. My rationale for purge
vs remove
is that I don't really know whether IOTstack does any "customisations" so I'm just playing it safe.
Now to the broken pipe. I have two questions:
Of the four answer combinations, I have seen what I suspect you might be experiencing multiple times on RPi4B/Wifi, a mere handful of times on RPi4B/Ethernet, but never on an RPi3B+. I don't know whether that means it's peculiar to the RPi4B, or if I just never use my RPi3B+ in anger for long enough to hit the problem.
I use a Mac and I tend to have several tabs open in Terminal all the time, one with an SSH session to my "live" system; ditto the "test" system. I'd come back in the morning and often find the "test" system tab on the end of a broken pipe.
At the time, the boxes were identical at the hardware level and had been built-alike as per the gist, both with the same version of IOTstack. The only obvious differences were:
The simplest thing to do was to switch the "test" system to Ethernet. That helped a lot but was only 99.9% effective.
A few weeks later, my "live" system stopped responding on Ethernet. All my IoT devices lit up like proverbial christmas trees, flashing alarm LEDs and going into reboot loops. Pinging the Ethernet interface wouldn't respond but I was able to get in via WiFi. I tried to reboot but it hung on the way down. I had to pull the power to get it back.
One of the downstream IoT devices sends an MQTT packet every 10 seconds so I could query the Influx database to get a 10-second time window when things had gone awry. That led me to suspect these lines in the log:
Mar 14 12:48:16 iot-hub avahi-daemon[361]: Withdrawing address record for 192.168.132.60 on eth0.
Mar 14 12:48:16 iot-hub avahi-daemon[361]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.132.60.
Mar 14 12:48:16 iot-hub avahi-daemon[361]: Interface eth0.IPv4 no longer relevant for mDNS.
and, from there, and after a few false starts, to a magic incantation which is a modification of this:
Using sudo
and your favourite text editor, create /usr/bin/isc-dhcp-fix.sh
and give it the following contents:
#!/bin/bash
logger "isc-dhcp-fix launched"
Card0()
{
ifconfig eth0 | grep -Po '(?<=inet )[\d.]+' &> /dev/null
if [ $? == 0 ]; then
sleep 1
else
logger "isc-dhcp-fix resetting eth0"
sudo dhclient eth0
fi
}
while true; do
Card0
done
Make sure the file is owned by root and has execute permissions:
$ sudo chown root:root /usr/bin/isc-dhcp-fix.sh
$ sudo chmod u+x /usr/bin/isc-dhcp-fix.sh
It's a bit hard to test that script "in anger" because I don't know how to make an interface die on command but you can at least make sure that it doesn't chuck up any obvious warnings by doing:
$ sudo /usr/bin/isc-dhcp-fix.sh
count to 10. Then Control+C to clobber it. Then:
$ grep "isc-dhcp-fix" /var/log/syslog
You should expect to see at least one line of output containing isc-dhcp-fix
Make a backup of /etc/rc.local
$ sudo cp /etc/rc.local /etc/rc.local.bak
Using sudo
and your favourite text editor, edit /etc/rc.local
so that the last few lines look like this:
/usr/bin/isc-dhcp-fix.sh &
exit 0
Double-check the trailing " &" which means "run this in the background". I don't know for a fact that your Pi will hang on reboot if you get it wrong but I, for one, don't want to find out.
Incidentally, it's that kind of problem I had in mind when I developed the gist about how to boot from SSD while always retaining the capability to revert to the SD. The SD with a known-good running system is always there so getting back into the SSD to fix things like a hosed rc.local is a cheap alternative to a complete rebuild.
Reboot.
Repeat the grep
command above. I usually see two lines around the time of the reboot, sometimes three.
Since I did that, I've had no more broken pipes on my SSH sessions. I added the above grep
command to my .profile and, occasionally, when I start a new SSH session I will see a "resetting eth0" message that was not associated with a reboot (you always seem to get one or two on a reboot).
You'll note that, unlike the script it is based on, my script isn't checking WiFi. That's for no other reason than I had both of mine connected to Ethernet so WiFi didn't seem important. Mod it for WiFi if you want but, as I'm no longer routinely getting into my Pis via WiFi, I have no idea whether it works.
Anyway, I hope this either helps or gives you some ideas on what to try next.
@CaptClaude There's a fix for that Python error on the experimental
branch. Just doing some more testing and it'll be merged into master. You can switch to it manually right now by following these instructions: https://github.com/SensorsIot/IOTstack/tree/experimental#experimental-features
You probably don't need the Python container, but it's there if you want it :).
The advice received here has been invaluable, thanks.
I have been exploring and everything seems to be working: Node-Red, Grafana, Portainer... so far so good.
Using Portainer, I see I have 2 NR images: iotstack_nodered:latest
and nodered/node-red:latest
, the latter being marked unused. There were a couple of other unused volumes (Python, for instance) and they could be removed with a click. Trying to remove the unused NR volume yields this message:
Failure
conflict: unable to delete 777225a60b3b (cannot be forced) - image has dependent child images
Is this normal or not important enough to worry about?
Apropos instability running off the USB power bank: after careful consideration, I think the problem might actually have been a cheap microUSB charging cable I was using (a free 100mm one). Might retry running with a proper cable.
Two Node-Red images are normal. The "unused" one is a base which is customised to produce the "running" image. Graham Garner said he had found that necessary but eventually intended to find a way around it. Just hasn't happened yet. In the meantime, leave things as they are. If you delete the "unused" one, it just has to be downloaded again.
On the flip side, the only way I have found of upgrading Node-Red is to explicitly clobber both images. My magic incantation for upgrading all containers is:
$ cd ~/IOTstack
$ docker-compose down
$ docker rmi "iotstack_nodered" "nodered/node-red"
$ docker-compose pull
$ docker-compose up --build -d
Maybe my lack of knowledge will assist the improvement of documentation (I would help if I could -- and knew how). It is certainly educational. On the subject of updates: I just noticed that Portainer had an update. I ran the menu, went to Docker commands and ran update all containers. It stopped and removed all containers, got the latest images, rebuilt Node-Red, restarted the stack and two strange things happened, one of which may just be "one of those things". First, this with mosquitto:
Creating mosquitto ...
ERROR: for mosquitto UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: for mosquitto UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Consider running prune-images to free up space
Suspecting an issue with Mosquitto:
pi@old-trout:~/IOTstack $ docker ps
CONTAINER ID IMAGE COMMAND
785d42a6186f eclipse-mosquitto "/docker-entrypoint.…"
d0fed45398a4 iotstack_nodered "npm start --cache /…"
ae7d035545c7 portainer/portainer "/portainer"
4d02ac773c3e pihole/pihole:latest "/s6-init"
71195bc3bd86 grafana/grafana "/run.sh"
af2702bc8785 influxdb:latest "/entrypoint.sh infl…"
I saw that telegraf was not running. (A quick test of Mosquitto (with NR) shows that it is working.) There is an image for telegraf, but it was not started after the update. I resolved this (and maybe this is the way to do it anyway) by running menu and Build the stack, selecting do not overwrite each time.
pi@old-trout:~/IOTstack $ docker-compose up -d
nodered is up-to-date
mosquitto is up-to-date
pihole is up-to-date
influxdb is up-to-date
grafana is up-to-date
portainer is up-to-date
Creating telegraf ... done
Telegraf is now running. There is a learning curve. Thanks again for your help.
Given that my original issue was more than resolved, we can call this one closed.
Freshly downloaded Buster image. Installed git, no issues. Installed IOTstack, no issues. Followed instructions, saw docker needed an update:
Updated docker:
pi@old-trout:~ $ sudo apt upgrade docker docker-compose
Finished without error. Restarted IOTstack, same issues:Basically, no matter what I tried, this was the result:
Suggestions?