Blockstream / greenlight

Build apps using self-custodial lightning nodes in the cloud
https://blockstream.github.io/greenlight/getting-started/
MIT License
109 stars 27 forks source link

Slow node startup causing RPC call failures #492

Open Sjors opened 1 month ago

Sjors commented 1 month ago

I tried to access a node that I created a year ago.

First I manually installed from master @ b9d1ecb9ea7325b041d08ddc171486fdad646a63 (see also #491).

From the directory with my earlier configuration I call:

glcli getinfo

Which after about 10 seconds returns:

[2024-08-06 15:53:05,669 - INFO] Configuring client with device credentials (legacy)
Traceback (most recent call last):
  File "/Users/sjors/.pyenv/versions/3.12.1/bin/glcli", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/glcli/cli.py", line 372, in getinfo
    res = node.get_info()
          ^^^^^^^^^^^^^^^
  File "/Users/sjors/.pyenv/versions/3.12.1/lib/python3.12/site-packages/glclient/__init__.py", line 147, in get_info
    bytes(self.inner.call(uri, bytes(req)))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Error calling /cln.Node/Getinfo: status: Internal, message: "No such file or directory (os error 2)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 06 Aug 2024 13:53:12 GMT", "content-length": "0"} }

Node id 036387fa666993defcf79719d5cdb3e1243d694e4a71922878c3dc83c4e65872ab.

cdecker commented 1 month ago

This is symptomatic of a slow startup, potentially on a host that was dedicated to the background sync. Looking at the node now, it does look normal.

Background: we occasionally start the node in the background to have it sync up with the blockchain. This is done to allow the node not to have to first sync hundreds of blocks when the user starts it, presumably because they want to either send or receive. We have a separate pool of resources for these background syncs, and users may end up on those heavily loaded machines if they get unlucky. We should really preempt the node, and re-schedule it on the interactive pool. This may add a bit of startup time, but it'd result in a much better user experience.

Sjors commented 1 month ago

Just tried again and got the same error. Waiting 30 seconds and trying again didn't help either.

cdecker commented 1 month ago

Did you call stop? If not the node will continue running for up to 15m, and any issues due to its location will only be resolved after it cycled once.

Sjors commented 1 month ago

I had not tried that.

% glcli stop   
[2024-08-09 15:49:45,037 - INFO] Configuring client with device credentials (legacy)
Node shut down

But still getting the same error calling getinfo after that.

(tried another stop and info but no luck)