ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.84k stars 901 forks source link

lightning-cli commands cause FATAL SIGNAL 6 on not fully confirmed channel #4451

Closed jsarenik closed 3 years ago

jsarenik commented 3 years ago

Issue and Steps to Reproduce

All is on testnet and this node

EDIT: It is related to channel waiting for block confirmations.

When I try to make an invoice using: lightning-cli --testnet invoice any "somethingrandom" "somethingelse"

I get a following backtrace on the lightningd side:

2021-03-24T16:37:10.869Z DEBUG   lightningd: Already connected via ey4gamgkhifhksqzhzbvqz3l2vbsdjnwlvqrkc5y35zl5kw6ztaxpoyd.onion:9735
2021-03-24T16:37:15.353Z DEBUG   gossipd: REPLY WIRE_GOSSIPD_GET_INCOMING_CHANNELS_REPLY with 0 fds
Assertion failed: a->c->rr_number < b->c->rr_number (lightningd/invoice.c: cmp_rr_number: 588)
lightningd: FATAL SIGNAL 6 (version v0.10.0rc1-1-g376fc36)
0x5579e1fb8ab6 send_backtrace
        common/daemon.c:39
0x5579e1fb8b5b crashdump
        common/daemon.c:52
0x7f14b584fc4a ???
        src/signal/x86_64/restore.s:1
2021-03-24T16:37:15.689Z **BROKEN** lightningd: FATAL SIGNAL 6 (version v0.10.0rc1-1-g376fc36)
2021-03-24T16:37:15.689Z **BROKEN** lightningd: backtrace: common/daemon.c:44 (send_backtrace) 0x5579e1fb8b0c
2021-03-24T16:37:15.689Z **BROKEN** lightningd: backtrace: common/daemon.c:52 (crashdump) 0x5579e1fb8b5b
2021-03-24T16:37:15.689Z **BROKEN** lightningd: backtrace: src/signal/x86_64/restore.s:1 ((null)) 0x7f14b584fc4a
Log dumped in crash.log.20210324163715
spenderp: Lost connection to the RPC socket.
pay: Lost connection to the RPC socket.
Aborted

getinfo output

{
   "id": "02ec72ede73cb2bd633267f1fb6654408b327194e9aacc1443b95eccf6d728136a",
   "alias": "ANGRYCHASER-0.10.0rc1-1-g376fc36",
   "color": "02ec72",
   "num_peers": 11,
   "num_pending_channels": 0,
   "num_active_channels": 11,
   "num_inactive_channels": 0,
   "address": [
      {
         "type": "torv3",
         "address": "5izkwusnbybcf4eqcoiv25one4cbld6k7l6mkplprsj7hjfus7mbfcyd.onion",
         "port": 31858
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "127.0.0.1",
         "port": 31858
      }
   ],
   "version": "v0.10.0rc1-1-g376fc36",
   "blockheight": 1965098,
   "network": "testnet",
   "msatoshi_fees_collected": 0,
   "fees_collected_msat": "0msat",
   "lightning-dir": "/home/uname/mmmlightning/testnet"
}

Note that the node runs on Alpine Linux (i.e. musl-libc).

jsarenik commented 3 years ago

Now it actually went through.

jsarenik commented 3 years ago

Will reopen if needed.

jsarenik commented 3 years ago

This is real. Reopening. Has something to do with a fresh channel which does not have 6 confirmations yet.

Now I generated invoice on a node which has stable and 1-day-old channels, but want to pay it by another node which just opened a channel:

$ lightning-cli --testnet pay lntb3130n1ps9ccufpp59ptypf0pvg6hehwp3lgduccdz49gmk82t4v3pujw6vwxw2x5g4qqdqldajh27tfypnk2apqwdhk6efqwdshgucxqyjw5qcqp2sp535t6ag3a6en4lwuxjhalvvvlg7gvd9ec7rvpyyk55cxfkkd8nxssrzjqf69tthcg57e9arsddtqkc2j0npp0h03fkjpwu8ga4nqwxg2rpgms80c2gqqqqgqqqqqqqlgqqqqn3qq9q9qy9qsq0qyy208an2aleahd2wyaajv5u9kqhvv3wqmtvguzf90xvs5x4cgpy230d3f5nxtay26y3sa0zrthj05kn354gukar8hxvp3av629f5gqt6hr2h
...
2021-03-25T10:15:55.167Z DEBUG   plugin-bcli: Log pruned 3111 entries (mem 10501239 -> 3522900)
2021-03-25T10:15:56.526Z DEBUG   034fe52e98a0e9d3c21b767e1b371881265d8c7578c21f5afd6d6438da10348b36-gossipd: Received channel_update for channel 1289738x1052x0/1 now ACTIVE
2021-03-25T10:15:57.364Z INFO    plugin-pay: cmd 2791 partid 0: Payment fee constraint 1565msat is below exemption threshold, allowing a maximum fee of 5000msat
Assertion failed: node - map->node_arr < tal_count(map->node_arr) (common/gossmap.c: gossmap_node_idx: 177)
pay: FATAL SIGNAL 6 (version v0.10.0rc1-6-g6dae011)
0x563e1e2d2010 send_backtrace
        common/daemon.c:39
0x563e1e2d20b5 crashdump
        common/daemon.c:52
0x7fa4460f6c4a ???
        src/signal/x86_64/restore.s:1
pay: FATAL SIGNAL 11 (version v0.10.0rc1-6-g6dae011)
0x563e1e2d2010 send_backtrace
        common/daemon.c:39
0x563e1e2d20b5 crashdump
        common/daemon.c:52
0x7fa4460f6c4a ???
        src/signal/x86_64/restore.s:1
2021-03-25T10:15:57.494Z INFO    plugin-pay: Killing plugin: exited during normal operation
2021-03-25T10:15:57.494Z **BROKEN** plugin-pay: Plugin marked as important, shutting down lightningd!
2021-03-25T10:15:57.494Z DEBUG   connectd: Shutting down
2021-03-25T10:15:57.495Z DEBUG   034fe52e98a0e9d3c21b767e1b371881265d8c7578c21f5afd6d6438da10348b36-channeld-chan#1: Status closed, but not exited. Killing
2021-03-25T10:15:57.496Z DEBUG   gossipd: Shutting down
2021-03-25T10:16:07.496Z DEBUG   gossipd: Status closed, but not exited. Killing
2021-03-25T10:16:07.507Z DEBUG   hsmd: Shutting down
2021-03-25T10:16:07.507Z DEBUG   plugin-autoclean: Killing plugin: exited during normal operation
2021-03-25T10:16:07.508Z DEBUG   plugin-bcli: Killing plugin: exited during normal operation
2021-03-25T10:16:07.509Z DEBUG   plugin-keysend: Killing plugin: exited during normal operation
2021-03-25T10:16:07.509Z DEBUG   plugin-txprepare: Killing plugin: exited during normal operation
2021-03-25T10:16:07.509Z DEBUG   plugin-spenderp: Killing plugin: exited during normal operation

@rustyrussell @cdecker any idea?

This is v0.10.0rc1-6-g6dae011

jsarenik commented 3 years ago

Exactly as I thought. Once the channel is in CHANNEL_NORMAL state, the lightning-cli does not case the daemon to die.

jsarenik commented 3 years ago

Previously I got following after running lightning-cli --testnet pay:

lightning-cli: reading response: socket closed                                  
jsarenik commented 3 years ago

@niftynei please have a look as well. I hope it is not related to musl-libc and will maybe ring a bell with what is currently being done.

jsarenik commented 3 years ago

The tests are probably not testing half-way channels (so that only a few blocks would be generated on regtest after funding a channel and some operation like invoice and pay would be tried in such a half-baked state).

rustyrussell commented 3 years ago

These crashes are completely different... one is in invoice, the other in pay.

Any change you can run lightningd under valgrind and reproduce? (You can run the pay plugin under valgrind by mv pay pay.real and putting the follwing in pay:

#! /bin/sh
exec valgrind -q "$0".real "$@"

I've had no luck here testing with unannounced channels, but perhaps I've not hit the right combination?

The other possibility is a miscompile; maybe try make clean && make?

rustyrussell commented 3 years ago

Here's the test I've added to tests/test_pay.py (run with pytest tests/test_pay.py::test_pay_unconfirmed ):

def test_pay_unconfirmed(node_factory, bitcoind):
    l1, l2, l3 = node_factory.line_graph(3, wait_for_announce=True,
                                         fundamount=10**6)

    # unconfirmed channel l1->l3
    l1.connect(l3)
    addr = l1.rpc.newaddr()['bech32']
    amount = 10**6 * 2
    bitcoind.rpc.sendtoaddress(addr, (amount + 1000000) / 10**8)
    bitcoind.generate_block(1)

    # Wait for it to hit wallet
    wait_for(lambda: len(l1.rpc.listfunds()['outputs']) > 1)
    res = l1.rpc.fundchannel(l3.info['id'], amount)

    # Various combinations of invoice issuance
    l1.rpc.invoice(str(10**5) + "sat", 'test1', 'test1')
    l1.rpc.invoice(str(10**5) + "sat", 'test2', 'test2',
                   exposeprivatechannels=True)

    # Try paying (will fail, needs that large channel!)
    inv = l3.rpc.invoice(str(amount // 2) + "sat", 'test', 'test')['bolt11']
    with pytest.raises(RpcError):
        l1.rpc.pay(inv)

    # Now mine funding, though we it's still private. 
    bitcoind.generate_block(1, wait_for_mempool=res['txid'])
    sync_blockheight(bitcoind, [l1])

    # Various combinations of invoice issuance
    l1.rpc.invoice(str(10**5) + "sat", 'test3', 'test3')
    l1.rpc.invoice(str(10**5) + "sat", 'test4', 'test4',
                   exposeprivatechannels=True)

    # This should mpp across both
    inv = l3.rpc.invoice(str(amount // 2) + "msat", 'test2', 'test2')['bolt11']
    l1.rpc.pay(inv)
jsarenik commented 3 years ago

These crashes are completely different... one is in invoice, the other in pay.

Any change you can run lightningd under valgrind and reproduce? (You can run the pay plugin under valgrind by mv pay pay.real and putting the follwing in pay:

As that happens on Alpine Linux, there is no Valgrind option. In case it is non-reproducible on Ubuntu, it may be one of those musl-libc-only things.

jsarenik commented 3 years ago

Here's the test I've added to tests/test_pay.py (run with pytest tests/test_pay.py::test_pay_unconfirmed):

Thank you for the test, I have added it to that file, but so far it just hangs when run.

~/src/lightning$ pytest tests/test_pay.py::test_pay_unconfirmed
============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/nsm/src/lightning
plugins: flaky-3.7.0, test-groups-1.0.3, dynamicrerun-1.1.1, sentry-0.1.3, rerunfailures-9.1.1, forked-1.0.2, cov-2.6.1, testinfra-1.19.0, timeout-1.4.2, benchmark-3.2.2, xdist-2.2.1
collected 1 item                                                               

tests/test_pay.py 

This test is running on Ubuntu now.

rustyrussell commented 3 years ago

pkg add valgrind should work for Alpine?

jsarenik commented 3 years ago

Oh yes, excuse me, it seems I got stuck years ago when valgrind did not yet work on Alpine and have not re-tried until now. Will let you know when I have some results. (Hopefully more freshmen who are not constrained with such obsolete memories will join the team :)

rustyrussell commented 3 years ago

I ran the test on Alpine (after much wrestling!) but couldn't see anything. I suspect a build problem, or hardware? Have there been other errors on this box?

jsarenik commented 3 years ago

Thank you for checking @rustyrussell ! No errors on this box as far as I know (it is in use since 2012 I bought it in California, was my main laptop, now serves as a server). It could have been a miscompile as you suggested because on this box I also played with esplora plugin previously. Now the lightningd points either to a local running bitcoind (for testnet) or RPC endpoint of bitcoind full node on another machine in the local network.

Closing for now, unable to reproduce. Will add here if the problem appears again.