dashpay / dashcore-node

Full node with extended capabilities using Dashcore and Dash Core (dashd)
Other
31 stars 75 forks source link

dashd not gracefully shutdown when used with docker-compose #36

Open c0deright opened 5 years ago

c0deright commented 5 years ago

I've built a docker image consisting of dashcore-node that spawns the dashd process.

When started via docker run -it image-name the current shell keeps being attached to the docker process so that later pressing key combo CTRL+C results in dashd gracefully shutting down. This can be confirmed by looking at dashd's debug.log which contains

Shutdown: done

as the last entry.

When using docker-compose however upon stopping the container dashd is SIGKILLED, debug.log has no entry about shutting down and the berkeleydb directory "database" is still present in dashd's datadir which is a clear indication that no graceful shutdown happened.

I can only guess that dashcore-node has to handle SIGNALs here for a graceful shutdown of the spawned dashd.

nmarley commented 5 years ago

@c0deright How are you shutting down via docker-compose? Are you using docker-compose down or just Ctrl+C? Docker-compose will try to gracefully shutdown by sending a SIGTERM and wait 10 seconds before sending SIGKILL (if process doesn't terminate by then).

Are you pushing Ctrl+C twice b/c the first one doesn't react? That will force the kill signal early, which could result in the behaviour that you see.

Docker-compose behaviour is documented here: https://docs.docker.com/compose/faq/#can-i-control-service-startup-order

c0deright commented 5 years ago

docker-compose in background

I start docker-compose with -d flag so it goes to background right away.

Shutdown via docker-compose down

export TZ=UTC; date -R ; docker-compose down ; date -R
Mon, 26 Nov 2018 11:08:38 +0000
Stopping foo_dash_1 ... done
Mon, 26 Nov 2018 11:08:49 +0000

Processes on the machine running docker-engine:

root      2520  0.0  0.1 208000  3548 ?        Sl   10:45   0:00  |   \_ docker-containerd-shim adef6eac6a31e5766a2b4c2b0cd96054371fd330b05dbb1e844baac622efbf2d /var/run/docker/libcontainerd/adef6eac6a31e5766a2b4c2b0cd96054371fd330b05dbb1e
844baac622efbf2d docker-runc
root      2539  0.0  0.0   4340   712 ?        Ss   10:45   0:00  |       \_ /bin/sh -c dashcore-node start
root      2586  0.1  3.0 1061112 62276 ?       Sl   10:45   0:01  |           \_ dashcore

While docker-compose down was running both processes with PID 2539 and 2586 didn't change until 10 seconds passed, then both processes were SIGKILLED.

debug.log stops abruptly, no clean shutdown logged, berkeleyDB directory database still present in datadir after container was stopped → unclean shutdown

Same when using docker-compose stop dash.

docker-compose in foreground

I start docker-compose without -d flag so it stays in foreground.

Shutdown via CTRL + C (pressed only once)

dash_1  | [2018-11-26T11:16:16.401Z] info: Dash Height: 977266 Percentage: 100.00
dash_1  | [2018-11-26T11:20:05.195Z] info: Dash Height: 977267 Percentage: 100.00
^CGracefully stopping... (press Ctrl+C again to force)
Stopping foo_dash_1 ... done

Same behaviour: debug.log has no indication of an initiated shutdown, dashd just got SIGKILLED, database directory still present → unclean shutdown

docker run in foreground

I start via: docker run --rm -it -v $PWD/dash-data:/root/dashcore-node/data my-dashcore-image

Shutdown via CTRL + C (pressed only once)

More than one instance of dashcore-lib found. Please make sure that you are not mixing instances of classes of the different versions of dashcore.
[2018-11-26T11:29:32.978Z] info: Using config: /root/dashcore-node/dashcore-node.json
[2018-11-26T11:29:32.978Z] info: Using network: livenet
[2018-11-26T11:29:32.979Z] info: Starting dashd
[2018-11-26T11:29:32.981Z] info: Using Dash config file: /root/dashcore-node/data/dash.conf
[2018-11-26T11:29:32.984Z] warn: Stopping existing spawned dash process with pid: 16
[2018-11-26T11:29:32.984Z] warn: Unclean dash process shutdown, process not found with pid: 16
[2018-11-26T11:29:32.984Z] info: Starting dash process
[2018-11-26T11:29:38.013Z] warn: Loading block index...
[2018-11-26T11:29:43.030Z] info: Dash Height: 977265 Percentage: 100.00
[2018-11-26T11:29:43.035Z] info: Dash Daemon Ready
[2018-11-26T11:29:43.035Z] info: Starting web
[2018-11-26T11:29:43.044Z] info: Starting @dashevo/insight-api
[2018-11-26T11:29:43.045Z] info: Dashcore Node ready
[2018-11-26T11:29:43.519Z] warn: ZMQ connection delay: tcp://127.0.0.1:28332
[2018-11-26T11:29:43.520Z] info: ZMQ connected to: tcp://127.0.0.1:28332
[2018-11-26T11:29:51.520Z] info: Dash Height: 977266 Percentage: 100.00
[2018-11-26T11:29:52.521Z] info: Dash Height: 977268 Percentage: 100.00
^C
[2018-11-26T11:30:12.863Z] error: uncaught exception: SIGINT
[2018-11-26T11:30:12.864Z] info: Beginning shutdown
[2018-11-26T11:30:12.864Z] info: Stopping @dashevo/insight-api
[2018-11-26T11:30:12.864Z] info: Beginning shutdown
[2018-11-26T11:30:12.865Z] info: Stopping @dashevo/insight-api
[2018-11-26T11:30:12.865Z] info: Stopping web
[2018-11-26T11:30:12.865Z] info: Stopping web
[2018-11-26T11:30:12.865Z] info: Stopping dashd
[2018-11-26T11:30:12.866Z] info: Stopping dashd
[2018-11-26T11:30:14.073Z] warn: ZMQ disconnect: tcp://127.0.0.1:28332
[2018-11-26T11:30:14.074Z] warn: ZMQ connection delay: tcp://127.0.0.1:28332
[2018-11-26T11:30:14.074Z] warn: ZMQ connection delay: tcp://127.0.0.1:28332
[2018-11-26T11:30:14.074Z] warn: ZMQ connection delay: tcp://127.0.0.1:28332

debug.log:

2018-11-26 11:30:11 CMasternodeSync::ProcessTick -- nTick 31 nRequestedMasternodeAssets 1 nRequestedMasternodeAttempt 0 nSyncProgress 0.000000
2018-11-26 11:30:12 tor: Thread interrupt
2018-11-26 11:30:12 addcon thread exit
2018-11-26 11:30:12 mncon thread exit
2018-11-26 11:30:12 torcontrol thread exit
2018-11-26 11:30:12 opencon thread exit
2018-11-26 11:30:12 scheduler thread interrupt
2018-11-26 11:30:12 PrepareShutdown: In progress...
[..]
2018-11-26 11:30:13 Dumped mempool: 7e-06s to copy, 0.04917s to dump
2018-11-26 11:30:13 Shutdown: done

→ clean shutdown

Conclusion

There is no way dashd is shutting down gracefully when docker-compose is involved. Only when running docker manually in foreground a clean shutdown can be initialted via ctrl+c.

docker-compose -v
docker-compose version 1.22.0, build f46880fe
nmarley commented 5 years ago

I've built a docker image consisting of dashcore-node that spawns the dashd process

Yep, I think I see what's going on. I'm mostly guessing here, but it seems that your dashcore-node (Node.js) process which spawns dashd is the main Docker process, and once that's died (via SIGTERM), the Docker daemon kills all other processes and shuts down the container. It's possible the Node.js process is handling SIGINT but not SIGTERM in the same way.

But really this is not super relevant, because your Node.js process shouldn't be spawning dashd anyway. The Docker way is to use one process per container. If your Docker container used only dashd and controlled the dashd process (without the Node.js layer), this would not be an issue. Docker is meant to manage these processes itself, not delegate thru another layer.

This docker-compose will do what you expect, using the reference DashCore Docker image. I just verified this myself.

version: '3'

services:
  core:
    image: dashpay/dashd
    restart: always
    ports:
      - 127.0.0.1:9998:9998
      - 9999:9999
    volumes:
      - dashd-data:/dash
    command: "dashd -rpcallowip=::/0 -printtoconsole=0"

volumes:
  dashd-data:

I disabled printing to console here so that you can inspect the debug.log in the volume after the fact and see that it's shutting down properly:

root@79c9a2b9b785:~# tail -n3 /data/.dashcore/debug.log
2018-11-26 13:22:54 netfulfilled.dat dump finished  0ms
2018-11-26 13:22:54 Dumped mempool: 1.4e-05s to copy, 0.001667s to dump
2018-11-26 13:22:54 Shutdown: done

We also removed support for dashcore-node spawning a dashd process and only support an external dashd install at any rate. The fact that the Node.js was downloading and managing it's own dashcore instance led to tight coupling of components and really is a terrible architecture.

At the time it didn't exist (at least with such good support), but Docker is actually the correct way to do what dashcore-node (actually parent project bitcore-node) is trying to do -- manage necessary dependencies and wire them together properly.

I'm currently making an effort to remove this repo entirely and not rely on this architecture.

nmarley commented 5 years ago

Note: I've added a gist w/this docker-compose file and instructions for verifying proper DashCore shutdown behaviour: https://gist.github.com/nmarley/c154596915071960ce3a09d8d8513b26

Please feel free to run this using the instructions and see if it works for you.

c0deright commented 5 years ago

Hi,

I just followed the documentation to get insight-api running with dashcore and that's using letting dashcore spawn the dashd process. I'm pretty sure, as you said, if I would seperate the dashd process from the nodejs app (both in their own docker-container) things will work as intended.

I might follow that path, yes, but since I'm not using the wallet (disablewallet=1 in dash.conf) for the insight-api there is currently no downside in dashd being KILLED and not being shutdown gracefully.

Just wanted to raise that issue here so others stumbling over it will find some resources.

c0deright commented 5 years ago

We also removed support for dashcore-node spawning a dashd process and only support an external dashd install at any rate.

In 4.0.0 'spawn' is still supported for service dashd.

I'm currently making an effort to remove this repo entirely and not rely on this architecture.

Isn't this repo still needed even when spawn is deprecated in favor of connect?

nmarley commented 5 years ago

No, this should never have been implemented in this manner at all really. The purpose of dashcore-node is to do what Docker does now, only this project does it terribly and is a bad architecture. We need to instead set up Insight to not need this and be able to run as its own service, instead of being require()'d by some other project in order to start the API service.

We'll have to do a release soon to merge develop branch into master and push a new version, but for now, feel free to check the develop branch for current recommendations.