fiatrete / OpenDAN-Personal-AI-OS

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.
https://opendan.ai
MIT License
1.58k stars 128 forks source link

Computing resource scheduling. #36

Open streetycat opened 10 months ago

streetycat commented 10 months ago

I have compiled a rough definition and design of the computing resource module, and I hope you can join the discussion to reach a consensus on this module.

Compute Node

A computing resource node in the system that should have the following functions:

  1. Install and start several services that support computing

  2. Accept calculation tasks submitted by users and execute them

  3. Schedule these tasks (various tasks may be executed in parallel or queued)

  4. Some preset standard task types, while others are customized by developers

  5. Some computing resources are public, and some may require authorization

Compute Task Manager

The singleton component responsible for managing computing resources in the system should have the following functions:

  1. Accept registration of 'Compute Node'

  2. Accept calculation tasks submitted by users and select appropriate nodes to execute

  3. Maintain load balancing among various computing nodes

Flowchart

  1. Start up

    graph TB
    subgraph ComputeNode["ComputeNode(node_id, node_entry)"]
        InstallService["InstallService(type, service_entry)"]-->StartService["StartService(type, service_entry)"]-->ServiceList["Services{type, service_entry}"]
    end
    
    ServiceList.->RegisterNode
    
    subgraph ComputeTaskManager
        RegisterNode["StartService(node_id, node_entry)"]-->Nodes["Nodes{node_id, node_entry}, Services{type, node_id[]}"]
    end
  2. Execute task

    graph TB
    subgraph ComputeTaskManager
        RunTask["Run(type, params, [node])"]-->SpecifyNode{"if (node)"}
        PostTask["PostTask(type, params, node)"]
        SpecifyNode--yes-->PostTask
        SpecifyNode--No-->FilterNode["nodes=Services(type)"]-->NextNode["node = nodes.next()"]
        WaitResult["result=WaitResult()"]
    end
    
    NextNode.->IsBusy-.yes.->NextNode
    IsBusy-.no.->PostTask
    PostTask.->ExecuteTask
    
    subgraph "ComputeNode(Any)"
        IsBusy{"is busy"}
    end
    
    subgraph "ComputeNode(Selected)"
        ExecuteTask["result=Execute(type, params)"]-->PostResult["PostResult(result)"]
    end
    
    PostResult.->WaitResult

I think we can first design a universal task scheduling framework, and then support various execution environments(docker eg.) and preset different task types within this framework.

waterflier commented 10 months ago

I am delighted to read your design and suggestions. They are insightful and show some understanding of the system. I am also very excited about the potential that your participation could bring to OpenDAN.

You can read https://github.com/fiatrete/OpenDAN-Personal-AI-OS/blob/MVP/doc/mvp/compute_task.drawio for more detail of compute kernel. I am writeing a artice about workflow now. The purpose of designing compute_kernel subsystem is to enable our users to use their computational resources more efficiently. These computational resources can come from devices they own (such as their workstations and gaming laptops), as well as from cloud computing and decentralized computing networks.

image