bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.25k stars 525 forks source link

how to avoid this server failure? Seems to happen randomly after 1 hour of running a script. #466

Closed ryanshrott closed 1 year ago

ryanshrott commented 1 year ago
Aug 16 16:22:30.104 [INFO] Route found: 0:40 via …vYA3Rn => 40:56 via …cz22Uj => 56:74 via …bfAGnD => 74:80 via …wYhAod
Aug 16 16:23:21.358 [WARN] [petals.client.lm_head.chunked_forward:72] Running the client with dtype bfloat16 on CPU may be slow, since your CPU doesn't support AVX512. Consider loading the model with torch_dtype='float32'
Aug 16 16:29:18.049 [WARN] [petals.client.inference_session.step:327] Caught exception when running inference via RemoteSpanInfo(peer_id=<libp2p.peer.id.ID (12D3KooWF8qNsYxPP3PVhzX3QzUHyUKV6sSz1qQDr85R5Ycz22Uj)>, start=40, end=56, server_info=ServerInfo(state=<ServerState.ONLINE: 2>, throughput=1040.823002928876, public_name='🦆FYY🌞', version='2.0.1.post2', network_rps=1185.4980980484086, forward_rps=9887.81852782432, inference_rps=343.100557603763, adapters=(), torch_dtype='bfloat16', quant_type='nf4', using_relay=False, cache_tokens_left=1179648, next_pings={'12D3KooWP9iwoB2QabHGEdPg2wtryEvxWCM5wUv3JVQhfWbfAGnD': 0.0017198077154218527, '12D3KooWGYnZxwR4ssuW8LYqSkGNDNH18sdmWgsqXPcCsZcjWtUS': 0.10628407145291568, '12D3KooWKbqGwThSgLiCa35aBU96HwHiG3WJmPET9nrPNVG5pJHP': 0.18390265026129785, '12D3KooWBMSbZNAjp7fxaAvwAyxsdVJa65WzDN5BjuBsBtvYA3Rn': 0.02952825440093875, '12D3KooWLzF1dYtP3BfCjTc88bfeSDo1yrVLyd2Gi8gt2Knk9Q2B': 0.14714445692487063})) (retry in 0 sec): TimeoutError()
Aug 16 16:29:18.050 [INFO] Due to a server failure, remote attention caches from block 40 to 56 will be regenerated
Aug 16 16:29:18.050 [INFO] Route found: 40:56 via …cz22Uj
Aug 16 16:29:18.051 [WARN] [petals.client.inference_session.step:327] Caught exception when running inference via RemoteSpanInfo(peer_id=<libp2p.peer.id.ID (12D3KooWF8qNsYxPP3PVhzX3QzUHyUKV6sSz1qQDr85R5Ycz22Uj)>, start=40, end=56, server_info=ServerInfo(state=<ServerState.ONLINE: 2>, throughput=1040.823002928876, public_name='🦆FYY🌞', version='2.0.1.post2', network_rps=1185.4980980484086, forward_rps=9887.81852782432, inference_rps=343.100557603763, adapters=(), torch_dtype='bfloat16', quant_type='nf4', using_relay=False, cache_tokens_left=1148512, next_pings={'12D3KooWP9iwoB2QabHGEdPg2wtryEvxWCM5wUv3JVQhfWbfAGnD': 0.0014327772169584435, '12D3KooWGYnZxwR4ssuW8LYqSkGNDNH18sdmWgsqXPcCsZcjWtUS': inf, '12D3KooWKbqGwThSgLiCa35aBU96HwHiG3WJmPET9nrPNVG5pJHP': inf, '12D3KooWBMSbZNAjp7fxaAvwAyxsdVJa65WzDN5BjuBsBtvYA3Rn': 0.666586954762116, '12D3KooWLzF1dYtP3BfCjTc88bfeSDo1yrVLyd2Gi8gt2Knk9Q2B': inf})) (retry in 1 sec): AssertionError("Broken input cache: span=RemoteSpanInfo(peer_id=<libp2p.peer.id.ID (12D3KooWF8qNsYxPP3PVhzX3QzUHyUKV6sSz1qQDr85R5Ycz22Uj)>, start=40, end=56, server_info=ServerInfo(state=<ServerState.ONLINE: 2>, throughput=1040.823002928876, public_name='🦆FYY🌞', version='2.0.1.post2', network_rps=1185.4980980484086, forward_rps=9887.81852782432, inference_rps=343.100557603763, adapters=(), torch_dtype='bfloat16', quant_type='nf4', using_relay=False, cache_tokens_left=1148512, next_pings={'12D3KooWP9iwoB2QabHGEdPg2wtryEvxWCM5wUv3JVQhfWbfAGnD': 0.0014327772169584435, '12D3KooWGYnZxwR4ssuW8LYqSkGNDNH18sdmWgsqXPcCsZcjWtUS': inf, '12D3KooWKbqGwThSgLiCa35aBU96HwHiG3WJmPET9nrPNVG5pJHP': inf, '12D3KooWBMSbZNAjp7fxaAvwAyxsdVJa65WzDN5BjuBsBtvYA3Rn': 0.666586954762116, '12D3KooWLzF1dYtP3BfCjTc88bfeSDo1yrVLyd2Gi8gt2Knk9Q2B': inf})) shape=torch.Size([1, 951, 8192]) position=0 n_input_tokens=1")
Aug 16 16:29:19.053 [INFO] Due to a server failure, remote attention caches from block 40 to 56 will be regenerated
borzunov commented 1 year ago

Hi @ryanshrott,

This means that some server disconnected (= server failure) and we have to retry using another server. This should be done automatically, and the script should proceed without issues - the message is just FYI.

Let us know if you have other issues.