Closed berry-13 closed 2 weeks ago
@AlexCheema thanks for the fix! Unfortunately, I can't connect them physically, but they are both on a 10GBIT LAN connection. Are there any other ways to troubleshoot this?
@AlexCheema thanks for the fix! Unfortunately, I can't connect them physically, but they are both on a 10GBIT LAN connection. Are there any other ways to troubleshoot this?
Can they reach each other e.g. with ping
?
Does it support CPU only? I am trying to set up 2 VM instances on GCP to try exo out. I used the python3 main.py
to start the exo, but the 2 nodes can not find each other. two node can ping each other, and can use nc
to send UDP packet test to port 5678
. I also used DEBUG=9
as op said, but the log keep saying i can find 0 peers. Anything I should look into? Thanks!
Can they reach each other e.g. with
ping
?
I think that this connection issue is likely related to WSL. Is there a way to get this working on Windows without using WSL?
Here are the logs from Windows:
Traceback (most recent call last):
File "D:\exo\main.py", line 184, in <module>
loop.run_until_complete(main())
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "D:\exo\main.py", line 170, in main
loop.add_signal_handler(s, handle_exit)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 553, in add_signal_handler
raise NotImplementedError
NotImplementedError
It crashes immediately after I start it
@AlexCheema closing this as the main request is more clearly explained in #184
the main issue was due to WSL blocking local IP access. I made some code modifications to enable it to start on Windows, and it successfully began connecting the two nodes. I'll wait for Windows support with llama.cpp
@AlexCheema closing this as the main request is more clearly explained in #184
the main issue was due to WSL blocking local IP access. I made some code modifications to enable it to start on Windows, and it successfully began connecting the two nodes. I'll wait for Windows support with llama.cpp
Do you mind pushing the code changes somewhere for the network fixes on windows?
Do you mind pushing the code changes somewhere for the network fixes on windows?
not many changes, to make sure that it won't kill it self I removed these two lines:
for s in [signal.SIGINT, signal.SIGTERM]:
loop.add_signal_handler(s, handle_exit)
and added this one:
signal.signal(signal.SIGINT, lambda s, f: handle_exit())
Issue Summary:
I've installed Exo on two systems:
Steps Taken:
python3 main.py
and also with debugging enabled:DEBUG_DISCOVERY=9 DEBUG=9 python3 main.py --inference-engine tinygrad
Issues Encountered:
llama3.1 7B
model on the PC, but it saturated both the RAM and VRAM. This seems unusual for a 7B model, which might indicate it's not quantized or possibly a WSL2 limitationMy Questions:
llama3.1 7B
model on WSL2