exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
GNU General Public License v3.0
10.45k stars 593 forks source link

Model only downloads 1 node at a time. #70

Closed magnusviri closed 1 month ago

magnusviri commented 2 months ago

I started up 11 nodes. I watched ~/.cache/huggingface/hub/ on all of them. The model, models--mlx-community--Meta-Llama-3-8B-Instruct-4bit, was downloaded one at a time. When one node finished, the next node would start downloading.

stephanj commented 2 months ago

Lets clarify current (inefficient) implementation aspect using Claude Sonnet 3.5:

Let's analyze this behavior and consider some potential improvements:

Current Behavior:

Potential Issues:

  1. Network Inefficiency: Multiple full downloads of the same large model file consume unnecessary bandwidth.
  2. Time Inefficiency: Sequential downloads mean the total setup time increases linearly with the number of nodes.
  3. Storage Redundancy: Each node storing a full copy of the model may be wasteful, especially if local storage is limited.

Possible Improvements:

  1. Parallel Downloads:

    • Allow multiple nodes to download simultaneously to reduce total setup time.
    • Implement a randomized delay before starting the download to reduce the chance of all nodes starting at exactly the same time.
  2. Local Caching and Sharing:

    • Implement a local caching mechanism where nodes can share downloaded model files.
    • The first node to complete a download could notify others, allowing them to copy from the local network instead of downloading from Hugging Face.
  3. Partial Downloads:

    • If the model can be split into chunks, each node could download only the portion it needs based on its assigned shard.
    • Nodes could then exchange these partial downloads to construct the full model collectively.
  4. Centralized Download:

    • Designate one node as a "download manager" that fetches the model and distributes it to other nodes.
    • This could be combined with a local file server to efficiently share the download.
  5. Pre-download Strategy:

    • In a controlled environment, consider pre-downloading the model to a shared location before starting the nodes.
  6. Peer-to-Peer Distribution:

    • Implement a peer-to-peer file sharing protocol (like BitTorrent) for efficient distribution of the model across nodes.
  7. Check for Existing Downloads:

    • Before initiating a download, check if the model already exists in the cache directory.
    • If it exists, verify its integrity and use the cached version instead of re-downloading.

Implementation Considerations:

To implement any of these improvements, you would need to modify the model loading logic, potentially add inter-node communication for file sharing, and possibly create new modules for download management and coordination. The specific approach would depend on your system's architecture, network topology, and operational requirements.

magnusviri commented 2 months ago

Claude is pretty good. 7 and 2 should already exist. At least I think they do.

My priorities are:

1 Parallel Downloads, 3 Partial Downloads, then 4, 5, and 6.

austinbv commented 2 months ago

I opened an issue that I think would be good for this https://github.com/exo-explore/exo/issues/99 We are working on it internally

AlexCheema commented 1 month ago

This is fixed now. Please reopen if still any issues.