fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
570 stars 222 forks source link

RPC is very slow when we are connected to a LND without channels #4875

Open douglaz opened 5 months ago

douglaz commented 5 months ago

If we have stuff like:

INFO info: ln_gateway: LN node returned no route hints, trying again in 2s num_retries=23

Because LND is still not completely up (zero channels)

Then any RPC call, like gateway-cli info will take at least 30s to complete.

m1sterc001guy commented 5 months ago

What do you think the recommended behavior should be? If there's no channels, the gateway isn't really usable, for outgoing or incoming payments. Maybe we should wait until a channel is setup before setting the gateway's state to Running

douglaz commented 5 months ago

The response time should not change. If the operation is invalid under that circumstance then it should answer with a proper error.

elsirion commented 5 months ago

I don't know if it should crash just because LND isn't ready yet, that seems to be a philosophical question (and @dpc tweeted about that recently iirc). But what we can certainly do is not block if we don't want to give extra route hints anyway. In that case we don't need to fetch them.

dpc commented 5 months ago

Then any RPC call, like gateway-cli info will take at least 30s to complete.

Are you sure it's "any call", not just info? Because fetch_lightning_route_hints that seems to be what's blocking is called only in handle_get_info and handle_connect_federation making this somewhat less severe.

I don't understand the requireements here, but some combination of returning old value / returning just empty hints / returning an error should help?

elsirion commented 5 months ago

Empty hints are very bad for unknown LN nodes/with only private channels. The GW won't be able to receive. That's why we retry for 30s or so before giving up.

dpc commented 5 months ago

If it's so important, by the time some calls fedimint-cli info GW should already have that info then in a cached value or something, I guess.

elsirion commented 5 months ago

Are you sure it's "any call", not just info? Because fetch_lightning_route_hints that seems to be what's blocking is called only in handle_get_info and handle_connect_federation making this somewhat less severe.

Oh, I didn't see your reply when posting mine (race conditions …) I don't think info needs to fetch this, ideally it would only return what the GW already knows/used for the last registration (idk how hard that would be to impl). handle_connect_federation needs it though imo for the first registration (unless we want to make that async).

dpc commented 4 months ago

ARGGHGHGHGHGH!!! :D

This is absolutely terrible and source of so many bugs...

The behavior is just right to happen in certain circumstanced but no one noticing because it will eventually time-out.

Gateway should respond to info irrespective if it has anything.

AAArrggghhhh..