Open lesong36 opened 3 months ago
Tl;dr: We need more robust connection management. One annoying issue right now after we introduced sticky node ids is that if a node restarts and changes its ephemeral port, other nodes may still try to talk to it on the previous port assigned to that node id.
The good thing is this is all pretty easy to fix just requires a small refactor of networking.
Tl;dr: We need more robust connection management. One annoying issue right now after we introduced sticky node ids is that if a node restarts and changes its ephemeral port, other nodes may still try to talk to it on the previous port assigned to that node id.
The good thing is this is all pretty easy to fix just requires a small refactor of networking.
Thanks for your answer. Anything I can do for this issue?
(.venv) (base) coty@P16:~/OneDrive/LLM/repo/exo$ ^C (.venv) (base) coty@P16:~/OneDrive/LLM/repo/exo$ ^C (.venv) (base) coty@P16:~/OneDrive/LLM/repo/exo$ DEBUG=9 python3 main.py None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/ \ \/ / \ | /> < () | \//\___/
Detected system: Linux Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader Trying to find available port port=50355 [60304, 55379, 57624, 60258, 57340, 58850, 53290, 55123, 57105, 59823, 50717] Using available port: 50355 Retrieved existing node ID: d639030c-62f3-47c5-bc1f-0ee22be53e67 Chat interface started: