Main db1000n network is unreachable when ovpn container is restarted

palianycia123 commented 2 years ago

I noticed that sometimes ovpn (service of docker-compose app) is restarted - which is ok because there might be some connectivity issue to VPN server. Autoheal container works perfectly here - it restarts ovpn container. But it doesn't restart main db1000n container because it doesn't have health check endpoint. It results in false positive of db1000n status - container is up and running, but packets are not transmitting.

Expected Behavior

It will be good to add health check end point in main db1000n program which will do e.g.: nslookup google.com and return 200 on success and non-200 on failure. Thus we will be confident in network setup by calling health check endpoint periodically via docker-compose.

Actual Behavior

Network is unreachable in main db1000n container when ovpn container is restarted.

Steps to Reproduce the Problem

docker-compose -f examples/docker/static-docker-compose.yml up -d (to see network is unreachable error in main db1000n container, add LOG_LEVEL:DEBUG environment in static-docker-compose.yml)
Turn host network off - to simulate network connectivity issue.
Observe ovpn container logs and wait until container is restarted. docker logs docker_ovpn_1 -f
```
2022-05-02 08:45:43 SIGTERM[hard,] received, process exiting
Exiting.
```
Turn host network on.
Observe that ovpn container logs shows it is connected successfully.
```
2022-05-02 13:45:07 Initialization Sequence Completed
```

Observe main program logs

error sending packet    {"error": "dial tcp 146.120.90.38:80: connect: network is unreachable", "args": {"address":"146.120.90.38:80","body":"{{ random_payload 10 }}","connection":{"args":{"address":"146.120.90.38:80","protocol":"tcp","proxy_urls":"","timeout":null},"type":"net"},"interval_ms":1,"packet":{"payload":{"data":{"payload":"{{ random_payload 10 }}"},"type":"raw"}}}}
error sending packet    {"error": "dial tcp 146.120.90.247:443: connect: network is unreachable", "args": {"address":"146.120.90.247:443","body":"{{ random_payload 10 }}","connection":{"args":{"address":"146.120.90.247:443","protocol":"tcp","proxy_urls":"","timeout":null},"type":"net"},"interval_ms":1,"packet":{"payload":{"data":{"payload":"{{ random_payload 10 }}"},"type":"raw"}}}}
error sending packet    {"error": "dial tcp 146.120.90.42:8080: connect: network is unreachable", "args": {"address":"146.120.90.42:8080","body":"{{ random_payload 10 }}","connection":{"args":{"address":"146.120.90.42:8080","protocol":"tcp","proxy_urls":"","timeout":null},"type":"net"},

Note: When I restart main db1000n container manually it resolves network issue.

Specifications

Version: v0.8.33
Platform: macos and ubuntu
Subsystem: docker-compose

arriven commented 2 years ago

I'm considering different ways to do it but it feels like this and #525 could be implemented in the same way. Or rather implementing that one would make it very easy to implement this one

palianycia123 commented 2 years ago

Well, simple app crashing/exiting in case of network issues, might not resolve this issue. According to autheal doc, health check is the dependecy for autoheal container. Note: You must apply HEALTHCHECK to your docker images first

arriven commented 2 years ago

I'm not completely sure but I assume autoheal would just restart the container if healthcheck fails. Crashing the main process would make the container restart anyway if correct restart policy is provided

arriven / db1000n