Open roeierez opened 4 months ago
Yes, we used to treat these as a faillible pre-flight test, but as the error being returned was the usual no route found
error, that did not really tell us much, we made it infallible, i.e., it is now required to pass.
The usual cause for these is a slow signer, a CLN <> VLS desync issue, or a slow peer.
Failing here is correct by the way, as we do not stand a chance to complete the payments without enough capacity back online. This just happens to tell us more than the catch-all no route found
error, so this is a symptom of us getting closer to the actual root cause :-)
Tracing through the logs we can reconstruct the following timeline:
0
connectsglclient.Node/Pay
start of pre-flight checksChecking if channel 841259x3019x0 is ready
So from what we can see here, we either fail to detect a channel becoming active, or we are not re-activating the channel correctly.
I'll dig further.
If desired we can make the pre-flight infallible again for this node, and see if that helps. That'd mean that the pre-flight checks are not detecting activation correctly.
Longer term we want to make the channel status (active / inactive) an explicit part of listpeerchannels
so we can skip some of the guesswork.
@cdecker I am in favor of anything that will help us improve here. We can update relai that we want to do that for this node if you want to go for it.
Relai user tries to send payment that results in pending. The logs shows that there is a timeout waiting for the channel to re-established. Also full logs are attached.
1721292295613.app.log