Chia-Network / chia-blockchain

Chia blockchain python implementation (full node, farmer, harvester, timelord, and wallet)
Apache License 2.0
10.83k stars 2.02k forks source link

[Bug] Unable to sync with slow internet (6 MBit/s) (weight proof timeout) #18211

Closed madMAx43v3r closed 2 months ago

madMAx43v3r commented 3 months ago

What happened?

When trying to sync the node over a slow mobile network, it's unable to receive any weight proof in time.

It always fails with RuntimeError: Weight proof did not arrive in time from peer after around 90 seconds, even though the timeout has been set to 360 seconds (default value), as shown from the debug output.

Speedtest without node running:

image

However when downloading from github.com I'm only getting around 200 KB/s (~1.5 MBit/s).

Speedtest to Germany (from Thailand) on TCP 8444:

$ iperf -c 5.9.57.230 -p 8444
------------------------------------------------------------
Client connecting to 5.9.57.230, TCP port 8444
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 10.42.0.110 port 39936 connected with 5.9.57.230 port 8444
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  7.50 MBytes  6.26 Mbits/sec

Also I'm not too far behind:

Current Blockchain Status: Syncing 5530181/5531801 (1620 behind).

Version

2.3.0 / 2.3.1 (same issue on both)

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

2024-06-20T14:56:31.427 full_node chia.full_node.full_node: INFO     Requesting weight proof from peer 37.235.174.43 up to height 5531555
2024-06-20T14:56:31.428 full_node chia.full_node.full_node: DEBUG    weight proof timeout is 360 sec
2024-06-20T14:58:06.497 full_node full_node_server        : DEBUG    Time for request request_proof_of_weight: PeerInfo(_ip=IPv4Address('37.235.174.43'), _port=8444) = 95.06866121292114, None? True
2024-06-20T14:58:06.498 full_node full_node_server        : WARNING  Banning 37.235.174.43 for 600 seconds
2024-06-20T14:58:06.499 full_node chia.full_node.full_node: ERROR    Error with syncing: <class 'RuntimeError'>Traceback (most recent call last):
  File "/opt/chia/chia/full_node/full_node.py", line 970, in _sync
  File "/opt/chia/chia/full_node/full_node.py", line 1011, in request_validate_wp
RuntimeError: Weight proof did not arrive in time from peer: 37.235.174.43
wjblanke commented 3 months ago

This happens with every peer? A log like this can happen if the peer disconnects. We used to recommend this early on when weight proofs took longer to validate and the timeout worked. Can u debug further to see if the 360 timeout isn't being honored in the code. It may take some time for an engineer here to get a "dialup" connection to test.

Maybe the issue is the lack of traffic on the websocket causing it to disconnect. Do you have a proxy running that would disconnect a silent connection.

emlowe commented 3 months ago

Max, in server.py there is a function start_client - maybe experiment with some different heartbeat settings here - it's not configurable in config.yaml, only in the code

ws = await session.ws_connect(
            url,
            autoclose=True,
            autoping=True,
            heartbeat=60,
            ssl=self.ssl_client_context,
            max_msg_size=max_message_size,
)
madMAx43v3r commented 3 months ago

Yes it happened for many hours with every peer it tried. Always 90 sec timeout, response None. I checked the code but I could not find any path where it could possibly return None...

I don't think the connection was idle, it was probably downloading the weight proof, just too slow.

You can try it with using your phone via USB tethering. The phone is acting as a router in this case. I had no issue downloading chia blockchain releases from github, which are 200 MB. So I don't think my phone was somehow killing connections.

The log also shows that the wallet banned the peer, it didn't disconnect.

madMAx43v3r commented 3 months ago

image

@emlowe If that's the issue, then it's probably on the server side, since the ping message from the server would not be received in time for the client to send back the pong.

While my wallet is busy receiving the weight proof, the server side ping would be stuck in the send queue.

emlowe commented 3 months ago

The server code doesn't set any of those parameters though - so tweaking them on the client should change the behaviour of the connection in general - the server code uses all aiohttp defaults for the most part.

I might try setting heartbeat=None and/or setting it very high

emlowe commented 3 months ago

You are right that I would expect some other log entries though if the connection was being closed abnormally - so I'm very unsure it will change anything for you

madMAx43v3r commented 3 months ago

Oh I see now, send_request() does return None in case asyncio.wait_for() returns without receiving the response first.

github-actions[bot] commented 2 months ago

This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.

github-actions[bot] commented 2 months ago

This issue was automatically closed because it has been flagged as stale, and subsequently passed 7 days with no further activity from the submitter or watchers.