ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.85k stars 902 forks source link

Revocations not being recorded in db #4217

Closed billygarrison closed 3 years ago

billygarrison commented 3 years ago

c-lightning v0.9.1 Bitcoin Core v0.20.1

I am running the Eye of Satoshi watchtower plugin and came across something unusual. Whenever there is a revocation in c-lightning, the commitment-revocation hook triggers the plugin to add an "appointment" (revoked tx to watch on-chain for) to the watchtower. There are batches of six appointments getting added every so often (~10-60 mins or so), even though I can't see any info on the lightningd database that shows a change to the channels. Nothing in forwarded_payments, invoices, payments, etc.

I tried working through the issue with @sr-gi but we can't figure out the exact cause. He says its normal for appointments to get added in pairs (one for incoming and one for outgoing channels during forwarding). But we aren't sure why it keeps happening in sixes, and with nothing showing in the db.

Here is an excerpt of the c-lightning log file showing everything that happened in the 1-2 seconds when this took place.

Here is the result of listpeers for one of the peers involved (not sure if there are other peers, Sergi pulled this peer_id from my logs.

I am trying to figure out;

  1. Why these revocations are happening in batches of six
  2. Why they aren't recorded in the database
ZmnSCPxj commented 3 years ago

We never actually record any revocation transactions. Instead, we only store revocation keys. Revocation keys are designed to be stored in things called "shachains", which @rustyrussell designed and created during the "getting Lightning to the ground" phase, he had some drunken blog posts about it. Shachains are an O(1) space structure, so this does not insert new entries into the database, it just updates the shachain BLOBs in the "shachain_known" database. So try looking for UPDATE queries to the database table "shachain_known". Because the structure is O(1) we do not INSERT to the database table for revocations, we just UPDATE overwrite the shachain blob in the table. You can make a simple hook to db_write and record the database queries plus timings.

HTLCs are also recorded since we need to know enough about them to revoke them, you are not seeing changes to the database for those?

During actual revocation in C-Lightning side, we just grind which revocation matches with the output. Various root pubkeys are stored in the database which let us derive the actual pubkeys for each commitment tx --- it is not stored on the watchtower since we do not want an identifying string to let the watchtower correlate updates to actual channels. Without the root pubkeys, the watchtower cannot do this grinding, so it cannot use the shachain O(1) structure either (and must be given revocation transactions, which are invented solely for the commitment_revocation hook --- we otherwise never generate the revocation transactions until actual revocation).

Not certain why they are in "batches of 6" though. My own count is that it would be 4 revokes for each forward -- 2 for the incoming (add HTLC and remove HTLC), and 2 for the outgoing (add HTLC and remove HTLC). If the HTLC itself were also included in the revocation hook, that would be the 5th and 6th, but the doc/PLUGINS.md claims that the second-stage htlc_success/htlc_failure txes are not sent in the hook. The doc could be wrong though.

sr-gi commented 3 years ago

If the HTLC itself were also included in the revocation hook, that would be the 5th and 6th, but the doc/PLUGINS.md claims that the second-stage htlc_success/htlc_failure txes are not sent in the hook. The doc could be wrong though.

They were not AFAICT, at least not on the initial design. @cdecker preferred to keep it simple for the first approach, which I kind of seconded during the discussion.

I think the batches could have come from multiple channels being updated close in time. I never checked the tower logs from @billygarrison, so correct me here if I'm wrong, but I agree with @ZmnSCPxj that pairs and groups of fours are the usual (for payments ending in the node / payments fwd by the node).

One of the weird things is that the payments were not being shown on @billygarrison node through listforwards, even though we could see debug traces of commitment revocation on the node's log. May have we missed something?

ZmnSCPxj commented 3 years ago

One thing we might have done would be to just reuse the same hook for the HTLC second-stage, with the second-stage HTLC txes as the commitment_txid even though they are technically not commtiments. But substantially the same mechanism is used to revoke them anyway and riding that hook would be better.

Not sure if we ever implemented it that way, maybe @cdecker can say.

billygarrison commented 3 years ago

@ZmnSCPxj I'm unsure where to look for the HTLCs, but the channel_htlcs table does not have HTLCs to match the revocations that are causing the commitment-revocation hook

A lot of this is above my skill level, but I'm happy to help work through the issue with guidance.

Here's a plot that shows the timing of the revocations. teos-add-appointments

I did cat teos.log | grep 'Adding appointment' (watchtower log), numbered the appointments from 1-549, and plotted the scatter plot. Note that doing cat log.log | grep 'got revoke' on the c-lighting logs gives me the same plot, so I guess it isn't an issue with the watchtower.

cdecker commented 3 years ago

TL;DR: don't worry, this is to be expected. HTLCs are not the only changes that require watchtower interactions.

As @ZmnSCPxj correctly pointed out, HTLC addition and removal are far from being the only commitment operations that end up creating a new state. Any new state needs to be reflected in the database and the watchtower also needs to be informed, since commitment operations invalidate the prior state through punishments as well, even though there was no HTLC added or removed.

Am I correct in assuming that the common stride of 6 appointments, which correspond to 6 db_writes, is a multiple of the number of open channels? That is most likely the on-chain feerate adjustment, which is based on your local feerate estimation, and requires each channel to commit to a new feerate. For the strides in which you get fewer than the number of channels, it's most likely because the remote end updated its feerate estimate, which also changes the commitment, and thus triggers a db_write and subsequent appointment.

So the fact that there are no incoming, outgoing or forwarded payments in correspondence to appointments being created is not surprising and should not be alarming. But it is some good digging on your end, and either the watchtower or we should add take this concern seriously and document it accordingly.

billygarrison commented 3 years ago

Thank you for the clarification.

I have more than 6 channels but not by much (8 or 10 channels total). I hadn't considered that updating the feerate would require a new state but that makes sense. The snippet of the debug log in my OP shows that it begins with a feerate estimate.

Is it normal for the feerate to get updated that frequently? (i.e. every 30-60 minutes)

cdecker commented 3 years ago

The feerate is update rather frequently (30 - 60 minute intervals is on the slow end), and we smooth the change over a couple of minutes as well (so we don't see a block, react to it in a jerky fashion, and spook our peer which hasn't seen the block yet).