buckyos / bucky_backup_suite

MIT License
0 stars 1 forks source link

I Redesigned the Framework Structure for Better Performance #5

Open streetycat opened 1 month ago

streetycat commented 1 month ago

Issues with the Old Structure:

  1. Excessive Scanning of Source:

    • The prepare phase fully scans the Source to obtain complete metadata (including file HASH).
    • The upload process requires re-reading the file data.
  2. Dependency on Metadata from prepare:

    • Transmission relies on metadata computed during the prepare phase, preventing parallel execution of transmission and prepare.
  3. Inefficient Incremental Information Generation:

    • Source cannot efficiently generate its file incremental information based on its environment and product requirements.

Key Improvements in the New Structure:

  1. Two Threads in Operation:

    • Source: Scans data directory information: a. Retrieves file directory structure to get a list of all files (can generate incremental information based on Source needs). b. Compiles a list of data blocks.
    • Target: Requests Engine to fill each data unit, performing final source data reads at this stage: a. Data compression may occur at this stage. b. Generates incremental information using default methods if Source hasn't provided it. c. Packages source data list into data blocks if Target stores in block form. d. Uploads source data in file form if Target stores in directory form. e. Computes HASH during data reading for final verification. f. Target can initiate additional upload threads for performance.
  2. Parallel Execution of Source and Target Threads:

    • The product can choose to complete the Source thread first to obtain more task information.

The Pseudo-Code is here

waterflier commented 1 month ago

I think we have reached a consensus on the big logic and key issues. Now let's write the real code and focus on the following core designs

  1. Chunk CheckPoint & Dir-CheckPoint

  2. ChunkSource implementation (this is very simple)

  3. A more general implementation of Dir Source based on the local directory

  4. A general Backup server protocol, which involves some protocol designs similar to WebDAV.

  5. Implement a Chunk Backup target based on 4

Our python pseudocode is already very complete, and we can try to use AI to build Rust code. However, according to my experience, the Rust code built by AI is not very reliable (even pass compiling is difficult). We can ask AI to generate the framework flow as much as possible (with correct data structure design, type system, and function signature), and then complete the implementation of each component one by one with the help of AI. At this time, AI will perform better.