Closed holmesworcester closed 1 year ago
Notes:
Found potential problems of slow initial connection:
We have set 2 min timeout in libp2p peerDiscovery bootstrap config which caused libp2p to do nothing for the first 2 min. In other words - it started discovering peers from given bootstrap list after 2 min. Since no one remembers why we set it it will be removed (default timeout is 1 second so libp2p starts dialing peers right away).
Community owner is always bootstrapping with 1 peer (itself). Because of that owner's peer does not try to dial anyone and just waits to be dialed by some other peer. This is a bug, I will create a separate issue for that. - https://github.com/TryQuiet/quiet/issues/1189 ~I'm wondering if we could use libp2p persistent datastore to keep information about discovered peers between app launches. One thing is that it would ignore our peers prioritization mechanizm.~ The easier way is to keep our list of peers in leveldb.
Notes:
libp2p:dialer 1 tokens request, returning 1, 9 remaining +1ms
backend:libp2p:websockets connect 2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion:443 +11s
backend:tor Jan 30 18:28:26.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:443. Giving up. (waiting for rendezvous desc)
backend:tor +2m
backend:libp2p:websockets:err connection error: Unexpected server response: 504 +2m
connect 2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion:443: 2:00.136 (m:ss.mmm)
backend:libp2p:websockets:err error connecting to /dns4/2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion/tcp/443/wss/p2p/QmRhSxzVM5rYcDqhdW8T3UJWF8r1LciLMZmG6X7Li8HXKj. Details: Unexpected server response: 504 +0ms
-> 2:00.136 (m:ss.mmm) of waiting for one dial
libp2p:dialer 1 tokens request, returning 1, 9 remaining +1ms
backend:libp2p:websockets connect 2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion:443 +2s
backend:tor Jan 30 18:31:08.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
backend:tor +30s
backend:libp2p:websockets:err connection error: Unexpected server response: 404 +19s
connect 2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion:443: 18.507s
backend:libp2p:websockets:err error connecting to /dns4/2cwiluyne5p7qaazmx4kmjhmtty5giz3l7cpnkuzeela4d6khkdn5xid.onion/tcp/443/wss/p2p/QmRhSxzVM5rYcDqhdW8T3UJWF8r1LciLMZmG6X7Li8HXKj. Details: Unexpected server response: 404 +0ms
-> 18.507s of waiting for one dial
We need to find a way to force libp2p to be more aggressive and asynchronous/parallel when connecting to peers it knows about, otherwise it will take forever to connect to any peers. Let's ask them how!
The release notes for 0.40.0 specifically say what to do to achieve the old behavior!
to replicate the old behaviour, listen for 'peer:discovery' events and dial peers manually
See: https://github.com/libp2p/js-libp2p/releases/tag/v0.40.0
Also, should we upgrade to 0.42? There are a bunch of bugfixes since 0.40.0 that could be creating issues for us. See my comment here: https://github.com/TryQuiet/quiet/issues/1168#issuecomment-1410629341
https://github.com/TryQuiet/quiet/pull/1193 This MR includes:
Bumping libp2p will be done in a separate MR
https://github.com/TryQuiet/quiet/pull/1198 - upgrade libp2p and its dependencies
[Edit] - it looks better however it dials many peers at once only after some time. First dials (one at the time) are handled by autodial, next (as in the logs below) are triggered from kad-dht:
libp2p:connection-manager dial to PeerId(Qmf2arsLTdbNxJPwRdFULAMXQiTUPfcfA6rceHHes6uvnC) +2m
libp2p:connection-manager dial to PeerId(Qmb46978en8M7ymGM2UEmY7mQU7BWSAV3W6rhvpd7jwk1h) +0ms
libp2p:connection-manager dial to PeerId(QmaC4adEC8CmXqfuu2K2Es9FRTR7txxKrZLvWrnj7FuYSx) +0ms
libp2p:connection-manager dial to PeerId(QmbKhdhUQvNUFQa1dAviTeNZQm3ynBBDn9VAgnNtHJe3Ar) +0ms
libp2p:connection-manager dial to PeerId(QmUUd2aEA8rjvV4Bz1UYcJtN757JFUYzhs2Je4cp6jpYbN) +0ms
libp2p:connection-manager had an existing connection to QmUUd2aEA8rjvV4Bz1UYcJtN757JFUYzhs2Je4cp6jpYbN +0ms
libp2p:connection-manager dial to PeerId(QmPGdGDUV1PXaJky4V53KSvFszdqEcM7KCoDpF2uFPf5w6) +0ms
libp2p:connection-manager dial to PeerId(Qmb5tdNiRFMVXv7NYP5Ms5pSJrNKJrm8DSq3JfJdcHVh5d) +0ms
libp2p:connection-manager dial to PeerId(QmeS8UY7r1Ggwf5fXoQhSSjos5Cug4T2FuwfzqoW5VXGeb) +0ms
libp2p:connection-manager dial to PeerId(QmRsxAUEf9YtsGSbE36hyUpjEpsWPgKXdWNk5TLh69EWcu) +0ms
libp2p:connection-manager dial to PeerId(Qmf93MdYA7nYSLQF9ezJ33tBF4kdPeBFVJi26VECzMhu2B) +0ms
backend:libp2p:websockets connect hvvam35ns4y2ndvofejvxy5oox2kcjriy7ydlbibxs4qlvpfq5fq5bid.onion:443 +2m
backend:libp2p:websockets connect ikmyya4ufygd2symcq4imbnn75aqll24qj3my4pj7t7kmxuyq7c334id.onion:443 +2ms
backend:libp2p:websockets connect rthacjlsxv5gphoblsaviqhutmvof4sr4s5insks2zhudft7htpik4id.onion:443 +1ms
backend:libp2p:websockets connect n3mx5qhnn5yfg734x3bhktjrjmzdmdbjuomgb4mwv3t2qqswe3oqrmid.onion:443 +0ms
backend:libp2p:websockets connect 43527pkkimtgzfgnuiqosn3q6t2zwkecd4rm5pe7dxnojlvl7x5vofyd.onion:443 +1ms
backend:libp2p:websockets connect zl37gnntp64dhnisddftypxbt5cqx6cum65vdv6oeaffrbqmemwc52ad.onion:443 +0ms
backend:libp2p:websockets connect p2wy2nga77g3cmcmjxmjz22ynjsinnkr43xuyg5ean4qs6tbov5h4dyd.onion:443 +1ms
backend:libp2p:websockets connect dl2yuv2g4qmxs3bnewjex4jtadesq4dwz7ob3dh336z4in4ebvclg6yd.onion:443 +0ms
backend:libp2p:websockets connect yihscxre6ezqcgdnadom5kz4xuca5if4xo42blyfrvlq6redaunxy2yd.onion:443 +1ms
However I still see delays with receiving messages
Further work here: https://github.com/TryQuiet/quiet/issues/1169
Problem: Since upgrading libp2p/ipfs, connecting to peers on startup and reconnecting after being offline are both slower than they should be because libp2p is not attempting multiple simultaneous connections. It should attempt some max of 10 connections at once. Right now it's attempting one connection at a time and only retrying when when connection fails.
What we think is going wrong: In our first refinements to libp2p we changed some settings so that libp2p would dial multiple addresses simultaneously. These settings no longer work.
What counts as success: We should be able to confirm that: 1) if you join a community, you attempt to dial all of the peers up to some configured max (currently 10?) simultaneously or as soon as you find out about them. 2) If you are in a community and reconnect, you should dial many peers simultaneously up to the max. Because the onion connection can take some time and because some peers will be offline, this will ensure that we find at least one good peer quickly.