Closed magnusviri closed 1 month ago
Lets clarify current (inefficient) implementation aspect using Claude Sonnet 3.5:
Let's analyze this behavior and consider some potential improvements:
Current Behavior:
Potential Issues:
Possible Improvements:
Parallel Downloads:
Local Caching and Sharing:
Partial Downloads:
Centralized Download:
Pre-download Strategy:
Peer-to-Peer Distribution:
Check for Existing Downloads:
Implementation Considerations:
MLXDynamicShardInferenceEngine
and potentially other parts of the system.To implement any of these improvements, you would need to modify the model loading logic, potentially add inter-node communication for file sharing, and possibly create new modules for download management and coordination. The specific approach would depend on your system's architecture, network topology, and operational requirements.
Claude is pretty good. 7 and 2 should already exist. At least I think they do.
My priorities are:
1 Parallel Downloads, 3 Partial Downloads, then 4, 5, and 6.
I opened an issue that I think would be good for this https://github.com/exo-explore/exo/issues/99 We are working on it internally
This is fixed now. Please reopen if still any issues.
I started up 11 nodes. I watched ~/.cache/huggingface/hub/ on all of them. The model, models--mlx-community--Meta-Llama-3-8B-Instruct-4bit, was downloaded one at a time. When one node finished, the next node would start downloading.