Closed jarret closed 4 years ago
looking over my script: the last two hops have the same pubkey along with the same legacy payload, which is wrong due to a bug in my script when creating the final hop. This almost certainly makes the onion invalid in that specific way, which could be a clue to the crash.
{'payload': '000946fa000605000000000000000003e80000000a000000000000000000000000',
'pubkey': '035c77dc0a10fe60e1304ae5b57d8fef87751add5d016b896d854fb706be6fc96c',
'style': 'legacy'},
{'payload': '000946fa000605000000000000000003e80000000a000000000000000000000000',
'pubkey': '035c77dc0a10fe60e1304ae5b57d8fef87751add5d016b896d854fb706be6fc96c',
'style': 'legacy'}]
EDIT: the pubkey is in fact wrong on all hops for the onion due to the same bug. :S
I was able to get the node booted again by unilaterally closing the channel of the first hop from the other side - a node which I control.
I did a sqlite3 dump and found the onion as the last entry in the channel_htlcs
table. I am not confident whether it would be safe to just drop that entry to recover that way, so just closing the channel seemed safer.
INSERT INTO channel_htlcs VALUES(12212,3,3849,1,NULL,3158,609547,X'84c622e9c59c3c935a2dfa89ed512213e5861a0327caf9d405918b1c88f4fe3a',NULL,X'00030385d9de0d0c6ac609758cd5f6ad9167cc70279c026801ddeed01588f64aff6009a8fac98216974bf64c50cf6486308e643a9934aa5a9b285cbd5f82af318a690733b5fc88f1b7759fc39728ea72822b6b118b22fd71108c24fcdb6a2f6bf9f6ff3fb6d0cca6b728aab9cbb732f096c65af6045d2f259e7a35650012fcf95e1a130703fa948cdd98f9227bd8a77b355eae59c30826fb81bf0e6a80666f4b348c3ad74ca2669eb7f31f6edbaf32feb929e13295fc27c1b7db150442805bfb800e23a4ec2e8707a36e0044d369b0c9959cb33c9c582db53cfb3a347fea99df1d96f4f01cdd9680a9f51adc8b935583ee8a8aecf078d6b54ab43e93ba4e79fb73b94984aa4343b4ffca2a1c66dcf45d2f122a271777fbaaadb57127cd623341dec36edd8b7ef5a26c81bd3c19315c118e8b515c7a2d78f47514a864e26cd3223f5c210d56f95a8079d26a80b2c4eaa821bcceb5506d526c1a008aaacc1121bafcb43e8169d1861bc9206470792bc1e0f8e570cdedaef7ad222befc1c59fe5fa51ff88547810fc2bc6b54a822d8ee3838191e585a14a4b02b75891603353ab548034aecf703c1619a1b5b024ba51dec43828038895ada82344b65c60c4251dfd39ebfe7951b775c8cf712cacde06b369e952de2eef92e18a2e8c2900abfeec32c70a907942152354bcef3ba49c98e4253f2a4ed2a446e942035f238194b86175ef7b6e231a60d5c0fd53a11c37cfb0b7d0947c46e25ffe83b67a16f1b08f64ebe3de1de86440aa3e7dfea43dea023cf538f1bbc32c0fb3d8a764cea30ec98b8b67e9e4106529511f9e6e70a3fcf5c84b6ebdfd50075a449f4205def633d3e573dcb31ee4da38e4d8f3a09c2f329b7ef621b800afa561b47c82f51feb6ae3ac5d2007ec109a5caa06dddcdc8e99e1c8307ca127015ccfb02048221459bfbd14f8780f9d6b618bbd0ea4dae82e793dbdf4f1b17e3a5e198d4dfc78f900beb5679d0b849e87da86b69a62fb9afd7b6b109d6f24e1c2784af6562ec3a9c9dce6664e9310c66413d8ce0e81c00cf41e60d35308eff5e97bd21afa7069a69c6a9bc989cc8a488513f662982d337d3e4233b2f001530a4025b98d4504b015fdd907d8f84e745c61a17096a253fc41f623a180e8917dffc072fb25fb27af3bce1fabbfcb3dc23aae5bd64bf746a12e8aa2772f37d56394ef04ec4649f3a50f78d42531df037ca5565b95b1a251096edac31411ff6f88e052682908b55419685bb63cd5e1b54261da7a4850e72f734ee6a99147d6c31fefcbf4ed0e98340008bb137c4749d2d01f979017a4806609d9de32572f4b375d4ec320da554c7c5859877e11cc343f33be9efe36dd7b5eacb499bb57ced276bc8af2eef624aa208b864ce5310dcad7279b3d8451d259a5385eb568c85bff1ee0ace7f6c2d4fd6039465d1a9fd37aea6644d13af0af46b5e39f78c7248ccdd5124e75451bf07bb6631b15383d71c370244a5818f0554867d03b6e0fbae20f9c765c771987fa7f157483f4abb5161a2c00e8d7508d9c38ebc039a48d39557cd11f771c6f90ab8b9a978a0e50b06419dd8feff0a3ebd626808165e6d2643aa034b5200b6c5376ff44c95c9ee59b03c1e36c2e543ed1521cd94dcb6547045df564f5057fc534f402b841fdda49d3ab25e19854644454c4acd94ae19e3fabff185e113931e0523e99b7d97bec08c07aae27a154c9972d52f4daad110cbbf47531b026b04586846ad6680b2ab882424c4345254e7af63cbe238c1304802173cd424cad766511933f8c87aafe676fd304ecbf3f3d366ecf1ef110d87b84525d90e8513b8e3037bbe003cd4c9408379e4b17b707273b07fcc0d6ed752a7d7670b9efc81aaaf36c5fc6569a13ba21cc1c01abbf095d2942721478bab5f8587379d3fb6a1799e579f7',NULL,49157,8,NULL,NULL,0);```
This looks like a real bug to me, but the crashing you are experiencing seems to be due to some kind of db corruption. The root cause is what caused the db corruption in the first place, and is what should be fixed I think. Do you happen to have any logs for the original crash?
yes, the crash.log is attached in the middle of the first post, but here is it again. crash.log.20191223201609.gz From Dec 15th up until the crash
pretty sure this is fixed by https://github.com/ElementsProject/lightning/pull/3434 where I ran into a similar issue.
That's now merged into master; can you please test?
Thanks!
Going to close this for now; will reopen if it's still an issue.
Initial crash was with
v0.8.0rc2
with a small patch that adds an extra field to a plugin notification. (I don't beleive this patch is related to the issue, unless something really gnarly is going on).Reverting to unpatched
v0.8.0
, the same crash happens on boot.I am writing scripts to use
createonion
andsendonion
. Upon sending an onion, it crashed like this:The crash log of the first crash crash.log.20191223201609.gz
The crash log from rebooting: crash.log.20191223201748.gz
My script was creating an onion to attempt to send along a circular route:
1) I created an invoice:
2) I queried an outgoing route:
3) I queried a returning route:
4) I assembled the two routes into a circular route (using logic from the
sendinvoiceless
plugin):5) I constructed the
hops
parameter, encoding the legacy payloads myself:NOTE - I am not sure I did this entirely correctly, figuring that out is the purpose of this experiment. Particlulary, I am not entirely certain how to set the
outgoing_cltv_value
in the legacy payload from the above route using thedelay
value. The documentation and example forcreateonion
coule possible be clearer.6) I passed the hops to
createonion
using thepayment_hash
dug out of the BOLT11 (via adecodepay
call) as theassocdata
parameter to get the onion and list of shared secrets:7) I called
sendonion
with the onion, the first hop from the circular route, thepayment_hash
value, a randomly generated uuid4 label, and theshared_secrets
array. It returned to my script:However, at that point the node crashed.
The line prior to the failed assertion makes me believe there is something wrong with the HMAC on the onion:
At that point, I tried restarting and was met with the same crash
At that point, I reverted to an unmodified v0.8.0 branch and was met with the same crash.
(aside: Is there a clean way to get rid of this bad HTLC so my node boots?)
It's entirely possible I am misunderstand/misusing something in this procedure, this is literally the first time I ever called
sendonion
, but I guess it shouldn't ever crash like this.