Closed kroese closed 5 months ago
Can you check if your bitcoin instance has the block 00000000000000000005f7a06bd4efe545999aba00eeff9a49747a3cd1f3c9df
? for pruning we have better alternative like https://github.com/clightning4j/btcli4j or other backend listed in https://github.com/lightningd/plugins
In addition, I think that the two problems that you have are related, in particular I think that in the last year the blockchain grows more than 100 GB
@vincenzopalazzo No, this block is from 2018 and I have only blocks from a year ago.
I am just trying to understand:
Why does it need this block? I have zero channels so it is not needed for my channels.. Does it need to verify the opening transaction for EVERY channel in the graph?
Why does it keep requesting it for hours? It should just fail once, and ignore the channel I suppose?
I would rather not switch to another backend. I thought pruning was fully supported as long as you make sure C Lightning does not get behind too far.
I did some more research and it seems indeed that the mistake was to wait until the IBD was completed, before starting C-Lightning. I should have let them run together while syncing. But this introduces other problems as C-Lightning syncs slower than Bitcoind and can get behind too far.
The best solution would if C-Lightning just implemented the getblockfrompeer
RPC call that was recently added to Bitcoin.
So now my only option is to connect C-Lightning to an external full node (without pruning) to let it validate all the channels in the graph.
That leads me to the final question:
Is it safe to switch C Lightning from the unpruned node back to the pruned node after it validated all the channels? And how do I know it has finished validating every channel in the graph, so that I have the garantuee it will never need an old block again?
The best solution would if C-Lightning just implemented the getblockfrompeer RPC call that was recently added to Bitcoin
this is an interesting idea as a fallback to getblock. cln on pruned nodes has always been a huge pain.
this is an interesting idea as a fallback to getblock. cln on pruned nodes has always been a huge pain.
Working to translate it in a compiled language (really compiled)
Is it safe to switch C Lightning from the unpruned node back to the pruned node after it validated all the channels? And how do I know it has finished validating every channel in the graph, so that I have the garantuee it will never need an old block again?
I think if you have old channel you need to verify them, so if you have a channel old 10 years can be a problem, However, I'm not 100% sure about that.
cc @cdecker
So is my mistake that I should have already started CLN while Bitcoin was still syncing the chain? That way CLN would have had access to the blocks from 2018 that are now pruned. Or is there no solution?
My approach is to start bitcoind and then CLN while it is still syncing. I noticed that bitcoind prunes faster than CLN processes blocks, use this script also constantly running as a workaround (it has locking, so just add * * * * * /home/cln/cln-prune-protector.sh 10000 >> /home/cln/cln-prune-protector.log 2>&1
to crontab), it will temporary disable bitcoind network activity if CLN is falling too much behind. https://github.com/kristapsk/cln-scripts/blob/master/cln-prune-protector.sh
The best solution would if C-Lightning just implemented the
getblockfrompeer
RPC call that was recently added to Bitcoin.
Kinda sounds right, but from my experience it will make CLN sync a lot slower, as at for most of the sync time it will ask for every block that way.
slow is better than broken. there has been ideas thrown in the past about using keep-blocks but then you run into disk space back pressure which might run out. I see your script is turning the network on and off... seems a bit extreme but it's an interesting approach.
@kristapsk Yes, I saw your script and really liked it. But since I am running both Bitcoin and C-Lightning in separate docker containers, I would need to heavily modify the script to be able to use it from the host machine.
Also I am not sure if the script will make the process 100% watertight. Because it would require a garantuee that CLN received all channel gossip before reaching the related blocks. But if it receives an additional old channel after that, it will still fail to get block. I don't know if there is a way to be sure that you received all gossips about every channel ever created. And even if there is, there is always the possibility that someone broadcast a new channel with a very old funding transaction.
I would need to heavily modify the script to be able to use it from the host machine.
Not sure about that. What you need is working both bitcoin-cli
and lightning-cli
on a CLN container. And CLN itself depends on a working bitcoin-cli
, right?
Script was actively doing turning on / off during IBD, afterwards it haven't done turning off (but it would if, for example, CLN service would not be running). I have prune=20000
in bitcoin.conf
on that specific VPS where I use it.
prune=anynumber
is unsafe with CLN for reasons you identified above.
$ bitcoin-cli help pruneblockchain
pruneblockchain height
Arguments:
1. height (numeric, required) The block height to prune up to. May be set to a discrete height, or to a UNIX epoch time
to prune blocks whose block time is at least 2 hours older than the provided timestamp.
Result:
n (numeric) Height of the last block pruned
Examples:
> bitcoin-cli pruneblockchain 1000
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "pruneblockchain", "params": [1000]}' -H 'content-type: text/plain;' http://127.0.0.1:8332/
The dependent app (in this case CLN) should instead be driving bitcoind's pruning with this RPC. With CLN in control of pruning you are never at risk of bitcoind pruning too far ahead.
You are right. But even though I made the mistake of letting Bitcoin sync first, it is still a bug that CLN tried to request the same block for hours in a loop.
It would made have much more sense to skip the blocks and ignore the related channels, instead of going into a deathloop.
I've been running on pruned mode successfully, but periodically it hits this bug. It's weird and appears possibly because of malicious gossip because it is always referencing a block from years ago, when lightning channels were only a glimmer in a nerds eye.
I've found a work around because on it's own it seems to get stuck in a loop requesting a block that doesn't exist and all the other node activity slows down. There are a couple of plugins that are meant to make running on a pruned node more reliable*. Although I have never been able to get btcli4j actually configured properly sync (it seems unable to fetch blocks) - just starting up clightning with that plugin clears the queue on fetching that block and then allows me to start up normally again.
Sounds like a redundant fallback lookup for old blocks would be a perfect plugin.
https://github.com/clightning4j/btcli4j/tree/ecacb049d41e2282c5595e84a6f9db6a601c3bc3
I get "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."
https://github.com/clightning4j/btcli4j/tree/ecacb049d41e2282c5595e84a6f9db6a601c3bc3
I get "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."
It is outside this repository - it is from the list of community plugins : https://github.com/lightningd/plugins
@kristapsk @AutonomousOrganization Just use the master branch https://github.com/clightning4j/btcli4j
There are a couple of plugins that are meant to make running on a pruned node more reliable*. Although I have never been able to get btcli4j actually configured properly sync (it seems unable to fetch blocks)
I put all my effort to keep alive and maintain my tool, but I can not dream of the bug that people have if you open an issue I can help you to configure it.
Disclaimer, there isn't really configuration :) just a flag to run in pruning mode
Just noticed the same issue on my new pruned node. Is there a reason why the official docs on pruning doesn't mention this issue? It looks like a common situation with non-negligible negative consequences.
Is it considered a bug or wontfix? Is it safe to ignore it, assuming bitcoind
and lightningd
agree on a current block height and it's up-to-date?
Just to add to the last comment... most comments in this thread suggest you should start lightningd when you start bitcoind. Is starting a lightning node never allowed for someone that has already got a bitcoin client up and running? I tried starting both daemons at the same time but ran into the issue anyway for reasons I think have already been discussed. Additionally though, for running a pruned node I found I could just download a prune snapshot and start from that rather than waiting to sync the entire blockchain. For both cases, it seems lightningd needs to work around this issue.
One year has passed, and this issue is still not fixed :(
Every few weeks I still run into this endless loop of getblock
calls, and restarting does not fix it. I have to point clightning to a non-pruned node to fetch that block, and revert back to using the pruning node immediately afterwards.
I think what happens is that sometimes it hears about a very old block through gossip, which triggers the endless loop.
It would be so easy to fix this: just ignore blocks that fail to fetch after X tries. Or otherwise add an option where you can specify the maximum block age and don't even try to fetch them. Or the best solution: use the getblockfrompeer
rpc call to automaticly fetch the missing block from a peer, when getblock
fails.
@vincenzopalazzo @rustyrussell @cdecker Can one of you please look into this, because either of these three solutions are simple to implement, and would solve this issue.
Or the best solution: use the getblockfrompeer rpc call to automaticly fetch the missing block from a peer, when getblock fails.
I will look into this, thanks
Can one of you please look into this, because either of these three solutions are simple to implement, and would solve this issue.
I will!
The problem with this RPC call is that you have to specify the index of the peer (for example the first peer) and you cannot say that you want it from ANY peer (in case the first peer is also pruned just like yourself). So either you have to implement logic to try all peers, or gamble that your first peer is not pruned.
But even if it just tries the first peer, I would be happy already, because in 90 percent of the cases it will work fine.
EDIT: I added a feature request ( https://github.com/bitcoin/bitcoin/issues/27652 ) to make this possible, but until that is implemented just trying the first peer would be fine.
My intention is to add something experimental to the plugin https://github.com/coffee-tools/folgore. Once we reach a consensus, we can try to integrate it with CLN. The plugin is a good place to experiment.
The original idea is to completely bypass Bitcoin Core if the block is out of range, and fetch the block directly from the network.
There are already multiple plugins that can workaround this issue (like your own btccli4j for example), but I use CLN via a prebuild Docker container, so I cannot install any plugins.
So my hope was that the issue could be fixed in CLN itself, not by using a different backend through a plugin.
Because it is a basicly a possible DoS attack: someone can send a channel funding message to me on purpose, which is referring to a very old block, and bring my node in an endless loop. Switching to a different backend is more like avoiding the problem instead of fixing it.
Ran into this issue again today, really getting tired of it..
I really don't understand why this won't get fixed:
And I really appreciate that @vincenzopalazzo is willing to look into this, but bypassing Bitcoin Core via a plugin seems to be complete overkill.
Yep, pruned mode shouldn't be advertised if it isn't working, it gives the users wrong expectations and breaks their nodes
I worked around this issue with CLN using a pruned bitcoind issue in a docker environment by using btc-rpc-proxy which is available as a docker blockstream/btc-rpc-proxy:latest.
I haved exposed the btc-rpc-proxy docker port 8331, and mount a config directory in /data with the following config.toml in it:
bitcoind_user = "hello" bitcoind_password = "world" bind_address = "0.0.0.0" bind_port = 8331 bitcoind_address = "192.168.1.160" bitcoind_port = 8332
[user.clnuser] password = "clnpassword" allowed_calls = [ "createrawtransaction", "decoderawtransaction", "decodescript", "echo", "estimatefee", "estimatepriority", "estimatesmartfee", "estimatesmartpriority", "getbestblockhash", "getblock", "getblockchaininfo", "getblockcount", "getblockhash", "getblockheader", "getchaintips", "getdifficulty", "getinfo", "getmempoolinfo", "getnetworkinfo", "getrawmempool", "getrawtransaction", "gettxout", "gettxoutproof", "gettxoutsetinfo", "sendrawtransaction", "verifytxoutproof" ]
You can then update the CLN to point to this proxy on 8331 with the clnuser and clnpassword, and it should work with your pruned node while pulling blocks p2p when required.
Is bcli
plugin active by default?
If so, shall we close this issue due to https://github.com/ElementsProject/lightning/pull/7240 being merged?
Correct @bubelov
Fixes by https://github.com/ElementsProject/lightning/pull/7240
I did a fresh installation of bitcoind with pruning set to 100gb. I waited untill it was completely synced. Then I installed CLN and connected to the bitcoind node.
The problem is that for the last hour it keeps requesting the same block from 2018 every second in a loop:
UNUSUAL plugin-bcli: /usr/bin/bitcoin-cli -datadir=/data/.bitcoin -rpcconnect=172.17.0.2 -rpcport=8332 -rpcuser=... -rpcpassword=... getblock 00000000000000000005f7a06bd4efe545999aba00eeff9a49747a3cd1f3c9df 0 exited with status 1
I don't understand why it keeps on trying this same block, since it should realize it's not available after trying only once. I think it heard of this block through channel gossip, since I don't have any channels yet myself.
Besides the problem with CLN getting stuck on this block, I think I will have another problem.
Namely that my graph will miss all channels created more than a year ago? I thought a pruned node would be fully functional, but if I miss all the old channels it is a big downside.
So is my mistake that I should have already started CLN while Bitcoin was still syncing the chain? That way CLN would have had access to the blocks from 2018 that are now pruned. Or is there no solution?
getinfo
output