Open small-cactus opened 1 month ago
Claude Sonnet 3.5 suggestions:
The issues you're encountering are likely related to device capability detection and network discovery. Let's address these problems step by step:
a) Check Windows Firewall:
b) Network Discovery:
c) Broadcast Messages:
device_capabilities()
function for Windows:import platform
import psutil
import wmi
def device_capabilities():
if platform.system() == "Windows":
c = wmi.WMI()
cpu_info = c.Win32_Processor()[0]
gpu_info = c.Win32_VideoController()[0]
model = f"Windows PC ({platform.processor()})"
chip = f"{cpu_info.Name}, GPU: {gpu_info.Name}"
memory = psutil.virtual_memory().total // (1024**2) # Convert to MB
# Estimate FLOPS (this is a very rough estimate)
cpu_ghz = float(cpu_info.MaxClockSpeed) / 1000 # Convert MHz to GHz
cpu_cores = int(cpu_info.NumberOfCores)
estimated_gflops = cpu_ghz * cpu_cores * 8 # Assume 8 FLOPS per cycle per core
return DeviceCapabilities(
model=model,
chip=chip,
memory=memory,
flops=DeviceFlops(fp32=estimated_gflops, fp16=estimated_gflops*2, int8=estimated_gflops*4)
)
else:
# Existing implementation for other platforms
...
This implementation provides a rough estimate of FLOPS based on CPU information. For more accurate GPU FLOPS, you might need to use a GPU-specific library or maintain a database of known GPU performances.
a) Use os.path.join()
for all file paths.
b) Use platform.system()
to check the operating system when making system-specific calls.
c) For network operations, consider using socket.getaddrinfo()
to get IP addresses, as it works across platforms.
Here's an example of how you might modify the discovery mechanism to work better on Windows:
import socket
import struct
class GRPCDiscovery(Discovery):
async def task_broadcast_presence(self):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
message = json.dumps({
"type": "discovery",
"node_id": self.node_id,
"grpc_port": self.node_port,
"device_capabilities": self.device_capabilities.to_dict()
}).encode('utf-8')
if platform.system() == "Windows":
# On Windows, we might need to broadcast to all network interfaces
for interface in socket.getaddrinfo(socket.gethostname(), None):
if interface[1] == socket.SOCK_DGRAM:
broadcast_address = interface[4][0].rsplit('.', 1)[0] + '.255'
sock.sendto(message, (broadcast_address, self.broadcast_port))
else:
# Existing broadcast logic for other platforms
sock.sendto(message, ('<broadcast>', self.broadcast_port))
await asyncio.sleep(self.broadcast_interval)
This modification attempts to broadcast on all available network interfaces on Windows, which might help with discovery issues.
Lastly, ensure that all necessary libraries (like wmi
for Windows) are installed on your Windows machine.
These changes should help improve the Windows compatibility of your exo project. Remember to test thoroughly on both Windows and other platforms to ensure these modifications don't introduce new issues. If you're still encountering problems, more detailed logs of the discovery process and network communications would be helpful for further debugging.
I got it to start on windows and detect other devices, however the windows pc itself is not being shown on other devices running exo, it gets detected as nothing, [] according to debug. On the windows PC it shows itself as unknown device at 0TFLOPS.
If anyone has an idea on how to get it to run, that'd be pretty cool.
Updated main.py to allow cross platform with everything:
I didn't really look at any other files to see where the issue might be happening so it might be an easy fix.
Other remarks:
Models will not download on windows
other systems show 1 node when windows shows 2 nodes
inference with any connected or no connected nodes will not work from the windows pc. it works fine when inference is done from another non windows connected system (because it only registers 1 node)
Even though windows will state that it has 2 nodes and is connected, none of the tokens get sent to the windows node when ran from Mac.
TLDR: Nothing works on windows, but it recognizes other systems, it's just that nothing works at all