SensorsIot / IOTstack

Docker stack for getting started on IOT on the Raspberry PI
GNU General Public License v3.0
1.45k stars 308 forks source link

ERROR: Couldn't connect to Docker daemon at http+docker://localhost #59

Closed CaptClaude closed 4 years ago

CaptClaude commented 4 years ago

Freshly downloaded Buster image. Installed git, no issues. Installed IOTstack, no issues. Followed instructions, saw docker needed an update:

pi@old-trout:~ $ cd ~/IOTstack && bash ./menu.sh
~/IOTstack ~/IOTstack
checking for project update
From https://github.com/SensorsIot/IOTstack
 * branch            master     -> FETCH_HEAD
Project is up to date
checking docker version
./menu.sh: line 197: docker: command not found
./menu.sh: line 202: [: : integer expression expected

Docker version less than 18.02.0 consider upgrading or you may experience issues
Upgrade by typing: 'sudo apt upgrade docker docker-compose'
~/IOTstack

Updated docker: pi@old-trout:~ $ sudo apt upgrade docker docker-compose Finished without error. Restarted IOTstack, same issues:

pi@old-trout:~ $ cd ~/IOTstack && bash ./menu.sh
~/IOTstack ~/IOTstack
checking for project update
From https://github.com/SensorsIot/IOTstack
 * branch            master     -> FETCH_HEAD
Project is up to date
checking docker version
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.39/version: dial unix /var/run/docker.sock: connect: permission denied
./menu.sh: line 202: [: : integer expression expected

Docker version less than 18.02.0 consider upgrading or you may experience issues
Upgrade by typing: 'sudo apt upgrade docker docker-compose'
~/IOTstack

Basically, no matter what I tried, this was the result:

pi@old-trout:~/IOTstack $ docker-compose up -d
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
pi@old-trout:~/IOTstack $ ps -aux | grep docker
root     12250  0.9  6.1 974360 54672 ?        Ssl  21:19   0:04 /usr/sbin/dockerd -H fd://
root     12259  0.5  2.3 943644 21328 ?        Ssl  21:19   0:02 docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level info
pi       19816  0.0  0.0   7348   472 pts/0    S+   21:26   0:00 grep --color=auto docker
pi@old-trout:~/IOTstack $ service docker restart
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to restart 'docker.service'.
Authenticating as: root
Password: 
polkit-agent-helper-1: pam_authenticate failed: Authentication failure
==== AUTHENTICATION FAILED ===
Failed to restart docker.service: Access denied
See system logs and 'systemctl status docker.service' for details.
pi@old-trout:~/IOTstack $ docker-compose up -d
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.

Suggestions?

Paraphraser commented 4 years ago

Hi,

I think the problem might turn out to be a first-run issue, in the sense of not having done the first run steps properly (which I realise can be a loaded statement but please bear with me).

Things like this have come up a couple of times in other posts so I will flesh out what I think is best practice (others may disagree, of course).

I know the README.md says something slightly different but the way I have always done a clean install is:

$ cd ~
$ git clone https://github.com/SensorsIot/IOTstack.git IOTstack
$ cd ~/IOTstack
$ ./menu.sh

then, I choose the first option to "Install Docker".

That ends with the requirement for a reboot.

After the Pi comes back:

$ cd ~/IOTstack
$ ./menu.sh

and I choose the second option "Build Stack".

I work through to the end of that process, and the result is docker-compose.yml. Then:

$ cd ~/IOTstack
$ docker-compose up -d

Once before a problem of the kind you seem to be having turned out to be that the person had installed docker via sudo apt install docker (or words to that effect).

If you take a peek inside menu.sh and search for "install" you'll wind up at:

        if command_exists docker; then
                echo "docker already installed"
        else
                echo "Install Docker"
                curl -fsSL https://get.docker.com | sh
                sudo usermod -aG docker $USER
        fi

        if command_exists docker-compose; then
                echo "docker-compose already installed"
        else
                echo "Install docker-compose"
                sudo apt install -y docker-compose
        fi

While apt is used to install docker-compose there's a curl piped to shell to install docker. Last time, the person reporting the problem did another clean install and let menu.sh do the work, and the problem was cured.

If that doesn't work and if this were my problem, I'd step through those installation commands by hand to see where things break.

Note: command_exists is a function defined inside menu.sh which expands to command -v docker

Another possibility might be the "Got permission denied" and other permission issues. That might be traced back to the sudo usermod in menu.sh.

While on this topic of things I do that might not be "strictly according to Hoyle", I do NOT let menu.sh be responsible for project updates. Before running menu.sh I always:

$ cd ~/IOTstack
$ git pull origin master

and then I study the output to see what has changed so that I have some idea of what might need to be re-built to pick up, say, a template change.

Taking a slightly longer way around has worked for me on both "gcgarner" and "sensorsIOT" versions of this project. The habits are now so deeply-ingrained that shortcuts like cd ~/IOTstack && bash ./menu.sh feel wrong.

Yes, come the day I change my shell to something other than bash I may need to up my game a bit. I know. I know!

I hope something in this helps you.

CaptClaude commented 4 years ago

I truly appreciate your taking time to document the process and I am absolutely sure that I was having that particular problem because I failed to follow the process (I installed docker first manually instead of letting the script do it for me, correctly). However, I am stymied by dropped connections to the Pi that cause the setup process to fail. I was able to successfully let menu.sh install docker and reboot. Ran script to build stack. that ran to an apparently successful conclusion. Ran run docker-compose up -d and it chugged along until I got packet_write_wait: Connection to 192.168.0.20 port 22: Broken pipe Logged back in, re-ran build the stack and it chugged along until ERROR: Service 'nodered' failed to build . Ran the docker command Delete all stopped containers and docker volumes. Rebuilt the stack again and it completed w/o error. Re-ran docker-compose up -d and it started by building node-red. It ran through a bunch of other pulls until it got to telegraf when the connection was dropped. Note that I was not sitting watching it at the time, because these things take a fair amount of time. So I logged back in again, rebuilt the stack, ran docker-compose up -d yet again and got a completely new error: ERROR: readlink /var/lib/docker/overlay2: invalid argument. Rebuilt the stack again, same error. Deleted the stopped containers again, rebuilt the stack and again, same error. Clearly I broke something. docker ps tells me docker is running with no containers. I am gutted. What on earth am I doing wrong? [edit] Is there a way to stop docker, uninstall it and re-install it? There must be, and although I first worked with UNIX in 1975 (University), I am not so deep into it that I sweat bash commands upon demand.

Slyke commented 4 years ago

The reason it's best to let IOTStack install docker is because it also adds the current user to be able to execute docker commands without needing to sudo.

Broken pipe is a bit weird though, that usually indicates the Pi rebooted, or that the sshd service suddenly died, or something like that.

Is your Pi getting adequate power? It's possible it's getting brownouts and these issues are surfacing. If you have it plugged into a monitor, watch for a rainbow square or lightning bolt on the screen (even in cli mode it will do this) while these commands are executing. Low voltage warnings also appear in the logs. I have seen this type of weird behavior with a Pi browning out, even so far as skipping lines in bash scripts randomly (this took a lot of hair pulling to figure out since I was ssh'ed in).

I would probably also try just reimaging the card. It may be quicker than troubleshooting the cause. It's hard to know what state the system is in if you installed, then uninstalled docker, then installed it again with the script, plus any changes made in attempting to troubleshoot/fix the issue.

CaptClaude commented 4 years ago

@Slyke you may be on to something (and thanks for respondig). I am on holiday in a rental house in the mountains of Arkansas (USA) and have been powering the Pi off of a thick USB battery pack which may not be suitable. Changing directions, I once again re-imaged the SD and go through the process with a proper RPi power supply (because you always take an extra Pi power supply on holiday, right?). As of this moment, I am chugging through the docker-compose up -d part and shall see what happens. As we say, "Film at 11". [edit] Sigh. Power might have been an issue before but now I have a Python build error that seems strangely familiar:

Step 1/5 : FROM python:3
3: Pulling from library/python
33f1e205e6f9: Pull complete
83963556ddab: Pull complete
e52e6e35f16e: Pull complete
cf172e11e265: Pull complete
ef3d123dcffe: Pull complete
ff1b28ed1a23: Pull complete
bbffd6608c5d: Pull complete
d97ae8cfec53: Pull complete
004d4db5a630: Pull complete
Digest: sha256:6fcd27ebeb1a5b4fd289ff15cb666e619c060c7b76f5a1b1a99d7cddb6de337a
Status: Downloaded newer image for python:3
 ---> 4eebd1ea17da
Step 2/5 : WORKDIR /usr/src/app
 ---> Running in 3d71e16e5c61
Removing intermediate container 3d71e16e5c61
 ---> cf73feb5a925
Step 3/5 : COPY requirements.txt ./
ERROR: Service 'python' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder721293025/requirements.txt: no such file or directory

docker psshows (yet again) docker is running but no containers.

Rebuild the stack again but leave out Python 3 and everything comes up:

pi@old-trout:~/IOTstack $ docker-compose up -d
Creating portainer ... done
Creating mosquitto ... done
Creating grafana   ... done
Creating pihole    ... done
Creating nodered   ... done
Creating influxdb  ... done
Creating telegraf  ... done
pi@old-trout:~/IOTstack $ docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS                   PORTS                                                                                       NAMES
f761014f48d6        telegraf               "/entrypoint.sh tele…"   4 minutes ago       Up 4 minutes             8092/udp, 8125/udp, 8094/tcp                                                                telegraf
f6221865b00e        influxdb:latest        "/entrypoint.sh infl…"   5 minutes ago       Up 4 minutes             0.0.0.0:2003->2003/tcp, 0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp                      influxdb
35e57bdc0597        grafana/grafana        "/run.sh"                5 minutes ago       Up 4 minutes             0.0.0.0:3000->3000/tcp                                                                      grafana
0bcc8d3a8148        iotstack_nodered       "npm start --cache /…"   5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:1880->1880/tcp                                                                      nodered
07ffe42c8b78        portainer/portainer    "/portainer"             5 minutes ago       Up 4 minutes             0.0.0.0:9000->9000/tcp                                                                      portainer
8b4e3851923f        pihole/pihole:latest   "/s6-init"               5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:53->53/udp, 0.0.0.0:53->53/tcp, 0.0.0.0:67->67/udp, 443/tcp, 0.0.0.0:8089->80/tcp   pihole
b12d2e7b9976        eclipse-mosquitto      "/docker-entrypoint.…"   5 minutes ago       Up 4 minutes             0.0.0.0:1883->1883/tcp, 0.0.0.0:9001->9001/tcp                                              mosquitto

I think I can claim conditional success and will test things when the sun comes up again. The question comes to mind: Do I need a Python 3 container? In the last year, all of my Python has been 3 and I am sure that I will need it, but do I need a container with it? Thanks!

Paraphraser commented 4 years ago

@Slyke responded while I was preparing this answer so some of this might be out-of-order.

To answer your last question first, the times I have wanted to get myself a clean slate I have either:

I did a reasonable number of complete RPi rebuilds when I was getting started but that was more because I wanted to be sure that some new problem B wasn't the product of some prior cock-up A that had not been completely fixed. I was also making sure that the process in the above gist worked.

I've done several "blow it away in situ" and that seems to work. My rationale for -compose first, then docker is that that is the inverse order of installation but I don't know if it is necessary. My rationale for purge vs remove is that I don't really know whether IOTstack does any "customisations" so I'm just playing it safe.


Now to the broken pipe. I have two questions:

  1. are you connecting to the Pi via WiFi or Ethernet?
  2. are you using an RPi4?

Of the four answer combinations, I have seen what I suspect you might be experiencing multiple times on RPi4B/Wifi, a mere handful of times on RPi4B/Ethernet, but never on an RPi3B+. I don't know whether that means it's peculiar to the RPi4B, or if I just never use my RPi3B+ in anger for long enough to hit the problem.

I use a Mac and I tend to have several tabs open in Terminal all the time, one with an SSH session to my "live" system; ditto the "test" system. I'd come back in the morning and often find the "test" system tab on the end of a broken pipe.

At the time, the boxes were identical at the hardware level and had been built-alike as per the gist, both with the same version of IOTstack. The only obvious differences were:

The simplest thing to do was to switch the "test" system to Ethernet. That helped a lot but was only 99.9% effective.

A few weeks later, my "live" system stopped responding on Ethernet. All my IoT devices lit up like proverbial christmas trees, flashing alarm LEDs and going into reboot loops. Pinging the Ethernet interface wouldn't respond but I was able to get in via WiFi. I tried to reboot but it hung on the way down. I had to pull the power to get it back.

One of the downstream IoT devices sends an MQTT packet every 10 seconds so I could query the Influx database to get a 10-second time window when things had gone awry. That led me to suspect these lines in the log:

Mar 14 12:48:16 iot-hub avahi-daemon[361]: Withdrawing address record for 192.168.132.60 on eth0.
Mar 14 12:48:16 iot-hub avahi-daemon[361]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.132.60.
Mar 14 12:48:16 iot-hub avahi-daemon[361]: Interface eth0.IPv4 no longer relevant for mDNS.

and, from there, and after a few false starts, to a magic incantation which is a modification of this:

  1. Using sudo and your favourite text editor, create /usr/bin/isc-dhcp-fix.sh and give it the following contents:

    #!/bin/bash
    
    logger "isc-dhcp-fix launched"
    
    Card0()
    {
    ifconfig eth0 | grep -Po '(?<=inet )[\d.]+' &> /dev/null
        if [ $? == 0 ]; then
            sleep 1
        else
            logger "isc-dhcp-fix resetting eth0"
            sudo dhclient eth0
        fi
    
    }
    
    while true; do
        Card0
    done
  2. Make sure the file is owned by root and has execute permissions:

    $ sudo chown root:root /usr/bin/isc-dhcp-fix.sh
    $ sudo chmod u+x /usr/bin/isc-dhcp-fix.sh
  3. It's a bit hard to test that script "in anger" because I don't know how to make an interface die on command but you can at least make sure that it doesn't chuck up any obvious warnings by doing:

    $ sudo /usr/bin/isc-dhcp-fix.sh

    count to 10. Then Control+C to clobber it. Then:

    $ grep "isc-dhcp-fix" /var/log/syslog

    You should expect to see at least one line of output containing isc-dhcp-fix

  4. Make a backup of /etc/rc.local

    $ sudo cp /etc/rc.local /etc/rc.local.bak
  5. Using sudo and your favourite text editor, edit /etc/rc.local so that the last few lines look like this:

    /usr/bin/isc-dhcp-fix.sh &
    
    exit 0

    Double-check the trailing " &" which means "run this in the background". I don't know for a fact that your Pi will hang on reboot if you get it wrong but I, for one, don't want to find out.

    Incidentally, it's that kind of problem I had in mind when I developed the gist about how to boot from SSD while always retaining the capability to revert to the SD. The SD with a known-good running system is always there so getting back into the SSD to fix things like a hosed rc.local is a cheap alternative to a complete rebuild.

  6. Reboot.

  7. Repeat the grep command above. I usually see two lines around the time of the reboot, sometimes three.

Since I did that, I've had no more broken pipes on my SSH sessions. I added the above grep command to my .profile and, occasionally, when I start a new SSH session I will see a "resetting eth0" message that was not associated with a reboot (you always seem to get one or two on a reboot).

You'll note that, unlike the script it is based on, my script isn't checking WiFi. That's for no other reason than I had both of mine connected to Ethernet so WiFi didn't seem important. Mod it for WiFi if you want but, as I'm no longer routinely getting into my Pis via WiFi, I have no idea whether it works.

Anyway, I hope this either helps or gives you some ideas on what to try next.

Slyke commented 4 years ago

@CaptClaude There's a fix for that Python error on the experimental branch. Just doing some more testing and it'll be merged into master. You can switch to it manually right now by following these instructions: https://github.com/SensorsIot/IOTstack/tree/experimental#experimental-features

You probably don't need the Python container, but it's there if you want it :).

CaptClaude commented 4 years ago

The advice received here has been invaluable, thanks. I have been exploring and everything seems to be working: Node-Red, Grafana, Portainer... so far so good. Using Portainer, I see I have 2 NR images: iotstack_nodered:latestand nodered/node-red:latest, the latter being marked unused. There were a couple of other unused volumes (Python, for instance) and they could be removed with a click. Trying to remove the unused NR volume yields this message:

Failure
conflict: unable to delete 777225a60b3b (cannot be forced) - image has dependent child images

Is this normal or not important enough to worry about?

Apropos instability running off the USB power bank: after careful consideration, I think the problem might actually have been a cheap microUSB charging cable I was using (a free 100mm one). Might retry running with a proper cable.

Paraphraser commented 4 years ago

Two Node-Red images are normal. The "unused" one is a base which is customised to produce the "running" image. Graham Garner said he had found that necessary but eventually intended to find a way around it. Just hasn't happened yet. In the meantime, leave things as they are. If you delete the "unused" one, it just has to be downloaded again.

On the flip side, the only way I have found of upgrading Node-Red is to explicitly clobber both images. My magic incantation for upgrading all containers is:

$ cd ~/IOTstack
$ docker-compose down
$ docker rmi "iotstack_nodered" "nodered/node-red"
$ docker-compose pull
$ docker-compose up --build -d
CaptClaude commented 4 years ago

Maybe my lack of knowledge will assist the improvement of documentation (I would help if I could -- and knew how). It is certainly educational. On the subject of updates: I just noticed that Portainer had an update. I ran the menu, went to Docker commands and ran update all containers. It stopped and removed all containers, got the latest images, rebuilt Node-Red, restarted the stack and two strange things happened, one of which may just be "one of those things". First, this with mosquitto:

Creating mosquitto ... 

ERROR: for mosquitto  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for mosquitto  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Consider running prune-images to free up space

Suspecting an issue with Mosquitto:

pi@old-trout:~/IOTstack $ docker ps
CONTAINER ID        IMAGE                  COMMAND                
785d42a6186f        eclipse-mosquitto      "/docker-entrypoint.…" 
d0fed45398a4        iotstack_nodered       "npm start --cache /…" 
ae7d035545c7        portainer/portainer    "/portainer"           
4d02ac773c3e        pihole/pihole:latest   "/s6-init"             
71195bc3bd86        grafana/grafana        "/run.sh"              
af2702bc8785        influxdb:latest        "/entrypoint.sh infl…" 

I saw that telegraf was not running. (A quick test of Mosquitto (with NR) shows that it is working.) There is an image for telegraf, but it was not started after the update. I resolved this (and maybe this is the way to do it anyway) by running menu and Build the stack, selecting do not overwrite each time.

pi@old-trout:~/IOTstack $ docker-compose up -d
nodered is up-to-date
mosquitto is up-to-date
pihole is up-to-date
influxdb is up-to-date
grafana is up-to-date
portainer is up-to-date
Creating telegraf ... done

Telegraf is now running. There is a learning curve. Thanks again for your help.

CaptClaude commented 4 years ago

Given that my original issue was more than resolved, we can call this one closed.