Closed Roasbeef closed 6 years ago
Hmm, actually if I just try to decode the raw error (instead of along with its framing), I'm able to properly decode it:
len: 4103
(*lnwire.FailTemporaryChannelFailure)(0xc42000e0b8)(TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(0xc420082370)({
Signature: (*btcec.Signature)(0xc4200107b0)({
R: (*big.Int)(0xc42000d060)(81490963346137529563603176940925655100903038764901586889101343776863355061597),
S: (*big.Int)(0xc42000d080)(47609384326767702852395815807248248468603241552974167823125523079358952585749)
}),
ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943,
ShortChannelID: (lnwire.ShortChannelID) 1255882:98:0,
Timestamp: (uint32) 1515351384,
Flags: (lnwire.ChanUpdateFlag) 1,
TimeLockDelta: (uint16) 144,
HtlcMinimumMsat: (lnwire.MilliSatoshi) 10000 mSAT,
BaseFee: (uint32) 10000,
FeeRate: (uint32) 100
})
))
The odd thing here still though, is that the prepended length of the update was 4103 bytes
, when the update itself is just 130 bytes
.
We're missing the len
field in our error message: we send 1007 | channelUpdate
instead of 1007 | len | channelUpdate
so the first 2 bytes you interpret as length are actually the first 2 bytes of the channel update's signature field.
The correct encoding should be:
10070080b42a4030b07a3f456093a3020b980c1f54a7b15d663d122360823091945ac55d6941f4e3c0bccc2339020993008ac4024c6af59d9a4a4074b3f12fb1fae93a1543497fd7f826957108f4a30fd9cec3aeba79972084e90ead01ea3309000000001329ca00006200005a526d580001009000000000000027100000271000000064
Ok, with a correct encoding lnd crashes when it receives the error (see log below). The parsed temporary failure message looks good (i.e. matches what was sent). I'm using lnd at https://github.com/lightningnetwork/lnd/commit/beeb75cb5fdd0f21d747fad64abdfd7ffbd08cb4
2018-01-10 16:40:56.082 [ERR] CRTR: Attempt to send payment abadffbc5695a4dc400a5a227b44da8231b7ee2b6f6bd9340b49fd8eb0eb3577 failed: TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(0xc420272dc0)({
Signature: (*btcec.Signature)(0xc420678470)({
R: (*big.Int)(0xc4205374a0)(49491959598261878261056428276442350416278241260864900079909248898946334365505),
S: (*big.Int)(0xc4205374c0)(13460190816797932915642861930206119487212107405213384283682323122815355044768)
}),
ChainHash: (chainhash.Hash) (len=32 cap=32) 0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206,
ShortChannelID: (lnwire.ShortChannelID) 592:1:0,
Timestamp: (uint32) 1515598737,
Flags: (lnwire.ChanUpdateFlag) 0,
TimeLockDelta: (uint16) 144,
HtlcMinimumMsat: (lnwire.MilliSatoshi) 1000 mSAT,
BaseFee: (uint32) 10000,
FeeRate: (uint32) 100
})
)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xace5cb]
goroutine 244 [running]:
github.com/lightningnetwork/lnd/routing.(*Route).nextHopChannel(0xc4205c9310, 0xc420471f20, 0x0, 0x0)
/home/fabrice/go/src/github.com/lightningnetwork/lnd/routing/pathfind.go:154 +0x8b
github.com/lightningnetwork/lnd/routing.(*ChannelRouter).SendPayment(0xc420214210, 0xc420233c00, 0x0, 0x0, 0x0, 0x0, 0xc420555796, 0xc420555790, 0x0)
/home/fabrice/go/src/github.com/lightningnetwork/lnd/routing/router.go:1651 +0xa52
main.(*rpcServer).SendPayment.func3(0xc42069a960, 0xc4205cb960, 0xc420232d40, 0xc4205cba20, 0xc42013cba0, 0x143f860, 0xc42014ca30, 0xc42069a900)
/home/fabrice/go/src/github.com/lightningnetwork/lnd/rpcserver.go:1802 +0x14b
created by main.(*rpcServer).SendPayment
/home/fabrice/go/src/github.com/lightningnetwork/lnd/rpcserver.go:1831 +0x5ad
If you run that test again with lnd
on tracelevel logging (--debuglevel=CRTR=trace
) for the router, can you paste the log dump? Thanks!
Working on a fix on our end. One question: this was triggered as we detected that the destination sent a temp chan failure, but there's an assumption in the code atm, that only any of the intermediate nodes will ever send that error (so we then try to prune that outgoing channel).
In this case the dest in starblocks, so why's it sending a temp chan failure if the HTLC reached it?
Or is it failing because it has too many incoming/accepted HTLC's already? https://github.com/ACINQ/eclair/blob/a3bdf52a2f8e5697b03b238ed1ca0581fe88381d/eclair-core/src/main/scala/fr/acinq/eclair/payment/Relayer.scala#L143
Closed by mistake...
The failure is sent by endurance, not starblocks.
Reason is that the channel is depleted (there are actually other non-depleted channels between endurance and starblocks)
I'm curious why you think that the destination sent the error?
I'm curious why you think that the destination sent the error?
That was just a hunch without the full logs.
Reason is that the channel is depleted (there are actually other non-depleted channels between endurance and starblocks)
If multiple channels are active, and an incoming HTLC request comes along, eclair will not just go ahead an utilize the available channel even if it wasn't the one specified in the onion payload?
If multiple channels are active, and an incoming HTLC request comes along, eclair will not just go ahead an utilize the available channel even if it wasn't the one specified in the onion payload?
There is a TODO for that in the relayer ;-)
I'm curious why you think that the destination sent the error?
Looked into it a bit more, and the crash above happend as we detected that the destination sent the error. I say this as the only way that the map lookup that led to the crash can fail, is if the destination sent the error. Still investigating on my side, but adding a patch to address this edge case for now.
Yes, the crash above happens when it is the destination sends the failure message, not when it's a relaying a node. I've run the following tests locally:
eclair 1 -- eclair 2 -- lnd, eclair 1 fails the HTLC with a TemporaryChannelFailure
=> lnd crashes
eclair 1 -- eclair 2 -- lnd, eclair 2 fails the HTLC with a TemporaryChannelFailure
=> lnd logs the error and remains functional
In both cases the failure message is logged properly
Lately, when I've been trying to send to/through starblocks on some of my nodes, my HTLC's keep getting rejected. I haven't been able to determine why exactly, as I can't decode the error message sent by eclair:
lnd
determines it's aFailTemporaryChannelFailure
, then tries to decode the message:Are y'all able to decode this?