ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.82k stars 896 forks source link

[Feature request] Wait a while for Bitcoin Core to start instead of crashing immediately #4355

Open NicolasDorier opened 3 years ago

NicolasDorier commented 3 years ago

At startup, c-lightning should wait a bit for Bitcoin core to start and not crash (https://github.com/ElementsProject/lightning/issues/4352)

NicolasDorier commented 3 years ago

Workaround we use in our fork: https://github.com/btcpayserver/lightning/commit/2ef9f6f0dd6872bac9e7b5a18a40386ac86d3b2a

laanwj commented 3 years ago

I have had a similar issue. You might want to look into bitcoin/bitcoin#21007 as for fixing this from the Bitcoin Core side.

NicolasDorier commented 3 years ago

I do not think this is related, I am not running bitcoind as a deamon. The error is that connection to bitcoind is refused when clightning try to connect to it.

In my case, bitcoind stays in the foreground in its container, and docker is the one starting services depending on it (clightning)

I think docker is using some shaky heuristic to know when it is the right time to launch a dependant services. Somehow, the fact that I need to use bitcoin-wallet create in bitcoin core's container before starting bitcoin core, made clightning starts too early. (before rpc bind to the port)

laanwj commented 3 years ago

It's related in the sense that it makes for an easy way to know when something dependent on bitcoind can be started. The -daemonwait process will only return when either the initialization is complete and RPC can process commands, or when initialization failed.

I don't know anything about docker though.

darosior commented 3 years ago

We do wait for bitcoind on startup indefinitely, if it its at least on warmup: https://github.com/ElementsProject/lightning/blob/483579f8b6bee8d48768b9cbf5b11e0c732d5a38/plugins/bcli.c#L817-L871

From your error in https://github.com/ElementsProject/lightning/issues/4352#issue-794033606 it looks like bitcoind has not even be started yet, and i'm not sure it's reasonable to wait with no clue that a bitcoind is actually going to start behind us at some point?

NicolasDorier commented 3 years ago

@darosior it would be nice to at least wait for 5-10 sec. (actually even 1 would be enough)

That's what our clightning fork is doing https://github.com/btcpayserver/lightning/commit/2ef9f6f0dd6872bac9e7b5a18a40386ac86d3b2a

As I said, the problem is that even if we configure c-lightning to depends on bitcoind at docker level, the fact that in bitcoin 0.21.0 I need to call bitcoin-create make docker start c-lightning start too early.

There is ways to work around that (like my commit above), or by having some boilerplate code in the docker entrypoint, but then waiting for the warmup become useless, as it is done in the bash script anyway.

NicolasDorier commented 3 years ago

@laanwj sadly, docker does not care about when bitcoind returns before starting the dependant services. And actually, bitcoind must run in foreground if hosted in container, so I can't even use -daemonwait

cdecker commented 3 years ago

One option might be to expose the -rpcwait command line option inside of the bcli plugin. That way we'd be waiting both for the RPC to startup, and for the warmup to complete.

-rpcwait internally polls the RPC interface ever 1 second indefinitely, and doesn't fail if the destination is unreachable, which is the issue at hand if I see correctly. So starting docker containers in any order would just have the lightningd container wait for the bitcoind container to become available.

My main concern is that it might mask some of the errors indefinitely, like a misconfigured RPC location or credentials, would no longer report an error, rather we'd just sit there and wait forever. Might be a good idea to propose an optional numeric argument to -rpcwait indicating the number of seconds to wait before reporting failure. So -rpcwait=10 would try 10 times, and after 10 seconds, we'd receive the error message which we could then report, alerting the operator of the issue.

@laanwj do you think that such a proposal would be welcome in bitcoin/bitcoin, or is there prior discussion on this?

laanwj commented 3 years ago

@laanwj do you think that such a proposal would be welcome in bitcoin/bitcoin, or is there prior discussion on this?

I'm not aware of any prior discussion of this. But sounds good to me.

NicolasDorier commented 3 years ago

@cdecker I tried to add -rpcwait, but this did not seem to solve the issue in my case. I think rpcwait does not work if the server refuse connection, it just returns instantly.

My main concern is that it might mask some of the errors indefinitely, like a misconfigured RPC location or credentials, would no longer report an error, rather we'd just sit there and wait forever.

Just try for 20 sec. You can detect whether rpc is loading or if it fails to connect. If rpc loading -> wait indefinitely. If fails to connect -> fails after 20sec or so.

cdecker commented 3 years ago

Are you sure about that? I just tried with bitcoin-cli pointing at a non-existent server (port that isn't being served on my local machine) and it hung indefinitely. So unless there is some weirdness with bitcoind binding to the port but not accepting connections I don't see how that could fail.

Furthermore if I then start a bitcoind on that port it works as expected. Can you retry in a dockerized setting?

cdecker commented 3 years ago

Well, I got curious myself what'd happen, so here's a gist that does a minimal recreation: https://gist.github.com/cdecker/4bdcf21192855022a02bd895d06fd18f

It starts bitcoind in one container, starts bitcoin-cli in another called lightningd and then attempts to call echo from the latter. If I remove the -rpcwait argument in the bitcoin-cli invokation it indeed complains, if I add it, all is good. To exacerbate the issue I also added a 10 second delay to the bitcoind startup, again no problem.

So my proposal would be to file a PR adding the max wait time to -rpcwait and then pass rpcwait through to the bcli plugin.

NicolasDorier commented 3 years ago

@cdecker I am not sure. I tried to use rpcwait, locally and as you pointed out, it was hanging. However, before trying my current solution, I tried inserting rpcwait directly in the gather_args function, and it did not worked.

Maybe it is because the bitcoin-cli we are using in the docker is rather old (0.18). Though I checked it had support for rpcwait.

NicolasDorier commented 3 years ago

The docker-compose we use is here:

https://github.com/btcpayserver/BTCPayServer.Lightning/blob/master/tests/docker-compose.yml

btcpayserver/lightning:v0.9.3-1-dev is with my workaround.

You can easily test what happen by:

  1. Make your change locally
  2. Create a dummy commit (you need to commit because the docker build is cloning the repo)
  3. docker build -t btcpayserver/lightning:v0.9.3-1-dev .
  4. docker-compose down -v
  5. docker-compose up -d
  6. docker logs

the docker-compose down -v make sure that the bug will be reproduced, as it makes sure bitcoind will call bitcoin-wallet create when starting, which exposed the bug.

Note that sometimes it works, so you need another round of down/up.