USF-IMARS / erddap-config

Content dir for docker-erddap incl setup.xml & dataset.xml
0 stars 1 forks source link

ERDDAP container does not restart after hypervisor power outage #15

Open 7yl4r opened 3 years ago

7yl4r commented 3 years ago

I found the ERDDAP container unexpectedly down after a power outage on the hypervisor (dune).

[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED       STATUS                    PORTS                                                 NAMES
55f9d4f95018   axiom/docker-erddap    "/entrypoint.sh cata…"   3 weeks ago   Exited (255) 4 days ago   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp             erddap
eb42e6974ea1   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 4 days (healthy)       0.0.0.0:8888->8080/tcp, :::8888->8080/tcp             mbon-dashboard-server_airflow-webserver_1
9c17ae6debac   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up About an hour          8080/tcp                                              mbon-dashboard-server_airflow-worker_1
714e01bb151f   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 4 days (healthy)       0.0.0.0:5555->5555/tcp, :::5555->5555/tcp, 8080/tcp   mbon-dashboard-server_flower_1
463faf47c645   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 4 days                 8080/tcp                                              mbon-dashboard-server_airflow-scheduler_1
f9f450e639b4   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 4 weeks ago                                                          mbon-dashboard-server_airflow-init_1
1d1eb66f64b4   postgres:13            "docker-entrypoint.s…"   4 weeks ago   Up 4 days (healthy)       5432/tcp                                              mbon-dashboard-server_postgres_1
ae8d44f3dd8b   redis:latest           "docker-entrypoint.s…"   4 weeks ago   Up 4 days (healthy)       0.0.0.0:6379->6379/tcp, :::6379->6379/tcp             mbon-dashboard-server_redis_1

Nothing interesting in docker logs erddap. Trying to start back up I see this error:

[root@dune erddap-config]# docker-compose up -d --build
Starting erddap ... 

ERROR: for erddap  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for erddap  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

LAN & WAN seem fine:

[root@dune erddap-config]# ping google.com
PING google.com (172.217.2.142) 56(84) bytes of data.
64 bytes from yyz08s14-in-f142.1e100.net (172.217.2.142): icmp_seq=1 ttl=118 time=8.63 ms
64 bytes from yyz08s14-in-f142.1e100.net (172.217.2.142): icmp_seq=2 ttl=118 time=8.70 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 8.634/8.666/8.698/0.032 ms
[root@dune erddap-config]# ping yin
PING yinmaster (192.168.1.203) 56(84) bytes of data.
64 bytes from yinmaster (192.168.1.203): icmp_seq=1 ttl=64 time=0.324 ms
64 bytes from yinmaster (192.168.1.203): icmp_seq=2 ttl=64 time=0.193 ms
^C
--- yinmaster ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 15ms
rtt min/avg/max/mdev = 0.193/0.258/0.324/0.067 ms

Tried docker-compose up again and got the same error. Did a reboot.

[root@dune ~]# docker container ls -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED       STATUS                    PORTS                                                 NAMES
55f9d4f95018   axiom/docker-erddap    "/entrypoint.sh cata…"   3 weeks ago   Exited (255) 5 days ago   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp             erddap
eb42e6974ea1   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 58 minutes (healthy)   0.0.0.0:8888->8080/tcp, :::8888->8080/tcp             mbon-dashboard-server_airflow-webserver_1
9c17ae6debac   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 58 minutes             8080/tcp                                              mbon-dashboard-server_airflow-worker_1
714e01bb151f   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 58 minutes (healthy)   0.0.0.0:5555->5555/tcp, :::5555->5555/tcp, 8080/tcp   mbon-dashboard-server_flower_1
463faf47c645   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Up 58 minutes             8080/tcp                                              mbon-dashboard-server_airflow-scheduler_1
f9f450e639b4   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 4 weeks ago                                                          mbon-dashboard-server_airflow-init_1
1d1eb66f64b4   postgres:13            "docker-entrypoint.s…"   4 weeks ago   Up 58 minutes (healthy)   5432/tcp                                              mbon-dashboard-server_postgres_1
ae8d44f3dd8b   redis:latest           "docker-entrypoint.s…"   4 weeks ago   Up 58 minutes (healthy)   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp             mbon-dashboard-server_redis_1

[root@dune erddap-config]# docker-compose up -d --build
Starting erddap ... 
Starting erddap ... error

ERROR: for erddap  Cannot start service erddap: driver failed programming external connectivity on endpoint erddap (b5bdc77641d2ba4a71a75443f323944e1f9951697e00e47b5268fad6a6990cb6): Bind for 0.0.0.0:8080 failed: port is already allocated

ERROR: for erddap  Cannot start service erddap: driver failed programming external connectivity on endpoint erddap (b5bdc77641d2ba4a71a75443f323944e1f9951697e00e47b5268fad6a6990cb6): Bind for 0.0.0.0:8080 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

Well that is different. Now that I am looking at it, indeed there might be some port conflicts with the airflow stuff... and now that I am looking at that: we're not using that! and there isn't a docker-compose.yml for it. Let's clean this up:

[root@dune ~]# docker container stop mbon-dashboard-server_airflow-webserver_1 mbon-dashboard-server_airflow-worker_1 mbon-dashboard-server_flower_1 mbon-dashboard-server_airflow-scheduler_1 mbon-dashboard-server_postgres_1 mbon-dashboard-server_redis_1

[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED       STATUS                      PORTS     NAMES
eb42e6974ea1   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 15 seconds ago             mbon-dashboard-server_airflow-webserver_1
9c17ae6debac   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 15 seconds ago             mbon-dashboard-server_airflow-worker_1
714e01bb151f   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 16 seconds ago             mbon-dashboard-server_flower_1
463faf47c645   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (1) 15 seconds ago             mbon-dashboard-server_airflow-scheduler_1
f9f450e639b4   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago   Exited (0) 4 weeks ago                mbon-dashboard-server_airflow-init_1
1d1eb66f64b4   postgres:13            "docker-entrypoint.s…"   4 weeks ago   Exited (0) 16 seconds ago             mbon-dashboard-server_postgres_1
ae8d44f3dd8b   redis:latest           "docker-entrypoint.s…"   4 weeks ago   Exited (0) 17 seconds ago             mbon-dashboard-server_redis_1

try again:

[root@dune erddap-config]# docker-compose up -d --build
Creating network "erddap-config_default" with the default driver
Creating erddap ... 
Creating erddap ... error

ERROR: for erddap  Cannot start service erddap: driver failed programming external connectivity on endpoint erddap (ae39dce980c8d428c9a5c3a3a989e5e1c8cf47338466fab847143bbd0cd33b82): Bind for 0.0.0.0:8080 failed: port is already allocated

ERROR: for erddap  Cannot start service erddap: driver failed programming external connectivity on endpoint erddap (ae39dce980c8d428c9a5c3a3a989e5e1c8cf47338466fab847143bbd0cd33b82): Bind for 0.0.0.0:8080 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

hmm

[root@dune erddap-config]# lsof -i -P -n | grep 8080
docker-pr  2551    root    4u  IPv4   3301      0t0  TCP *:8080 (LISTEN)
docker-pr  2559    root    4u  IPv6  28007      0t0  TCP *:8080 (LISTEN)

[root@dune erddap-config]# systemctl stop docker
[root@dune erddap-config]# lsof -i -P -n | grep 8080
[root@dune erddap-config]# systemctl start docker
[root@dune erddap-config]# docker-compose up -d --build
Starting erddap ... 

ERROR: for erddap  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for erddap  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

uhhhh...

[root@dune erddap-config]# lsof -i -P -n | grep 8080
docker-pr 52624    root    4u  IPv4 245188      0t0  TCP *:8080 (LISTEN)
docker-pr 52631    root    4u  IPv6 245192      0t0  TCP *:8080 (LISTEN)
[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS                   PORTS                                                 NAMES
2089de3635d0   axiom/docker-erddap    "/entrypoint.sh cata…"   3 minutes ago   Created                  0.0.0.0:8080->8080/tcp, :::8080->8080/tcp             erddap
eb42e6974ea1   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago     Up 2 minutes (healthy)   0.0.0.0:8888->8080/tcp, :::8888->8080/tcp             mbon-dashboard-server_airflow-webserver_1
9c17ae6debac   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago     Up 2 minutes             8080/tcp                                              mbon-dashboard-server_airflow-worker_1
714e01bb151f   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago     Up 2 minutes (healthy)   0.0.0.0:5555->5555/tcp, :::5555->5555/tcp, 8080/tcp   mbon-dashboard-server_flower_1
463faf47c645   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago     Up 2 minutes             8080/tcp                                              mbon-dashboard-server_airflow-scheduler_1
f9f450e639b4   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   4 weeks ago     Exited (0) 4 weeks ago                                                         mbon-dashboard-server_airflow-init_1
1d1eb66f64b4   postgres:13            "docker-entrypoint.s…"   4 weeks ago     Up 2 minutes (healthy)   5432/tcp                                              mbon-dashboard-server_postgres_1
ae8d44f3dd8b   redis:latest           "docker-entrypoint.s…"   4 weeks ago     Up 2 minutes (healthy)   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp             mbon-dashboard-server_redis_1

WHAT?!?

docker-compose.yml only has the ERDDAP container in it. Why did those airflow containers turn back on? Let's try again but be less nice to them:

[root@dune erddap-config]# docker container stop mbon-dashboard-server_airflow-webserver_1 mbon-dashboard-server_airflow-worker_1 mbon-dashboard-server_flower_1 mbon-dashboard-server_airflow-scheduler_1 mbon-dashboard-server_postgres_1 mbon-dashboard-server_redis_1

[root@dune erddap-config]# docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
2089de3635d017a251205da3ad3c5f7c4e27e44b27714ace73d913b8e3d235d4
eb42e6974ea131887214dddd8e53472720be510c42a86220ab83cebd146cac18
9c17ae6debacad82a405d6139bbc90b6bdf7c7631b11fa29672310210d89ce04
714e01bb151fe2e9f0d26a8fcbed2690a3d43ba0201c0291b07f6f602eaedaea
463faf47c645748b6cc0a8eb3dc2f32ae9ace22eb37274cdb9941ad8a858ce93
f9f450e639b41670e20042ff37f0f5d39d1c5034991252791f98fc144ac79f6a
1d1eb66f64b431be214ecbda37fec865eec9650876e6983bac40b2d329c4d431
ae8d44f3dd8b8d10bfc98eb44a0844ff7c7c6091bec8836ddd9b7cc9c3020f6d

Total reclaimed space: 154.3MB
[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
[root@dune erddap-config]# lsof -i -P -n | grep 8080
docker-pr 56599    root    4u  IPv4 297088      0t0  TCP *:8080 (LISTEN)
docker-pr 56606    root    4u  IPv6 297997      0t0  TCP *:8080 (LISTEN)
[root@dune erddap-config]# systemctl stop docker
Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
[root@dune erddap-config]# lsof -i -P -n | grep 8080
[root@dune erddap-config]# systemctl start docker
[root@dune erddap-config]# lsof -i -P -n | grep 8080
[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
[root@dune erddap-config]# docker-compose up -d --build
Creating erddap ... error

ERROR: for erddap  Cannot start service erddap: error while creating mount source path '/srv/imars-objects/modis_aqua_fk': mkdir /srv/imars-objects/modis_aqua_fk: permission denied

ERROR: for erddap  Cannot start service erddap: error while creating mount source path '/srv/imars-objects/modis_aqua_fk': mkdir /srv/imars-objects/modis_aqua_fk: permission denied
ERROR: Encountered errors while bringing up the project.

Okay so that error is because thing2 is down. (https://github.com/USF-IMARS/server-status/issues/166). I brought thing2 back up and started up without issue.

[root@dune erddap-config]# docker container ls -a
CONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS         PORTS                                       NAMES
5bdcd91cfd9a   axiom/docker-erddap   "/entrypoint.sh cata…"   55 minutes ago   Up 2 minutes   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   erddap