Open streetycat opened 1 month ago
I think we have reached a consensus on the big logic and key issues. Now let's write the real code and focus on the following core designs
Chunk CheckPoint & Dir-CheckPoint
ChunkSource implementation (this is very simple)
A more general implementation of Dir Source based on the local directory
A general Backup server protocol, which involves some protocol designs similar to WebDAV.
Implement a Chunk Backup target based on 4
Our python pseudocode is already very complete, and we can try to use AI to build Rust code. However, according to my experience, the Rust code built by AI is not very reliable (even pass compiling is difficult). We can ask AI to generate the framework flow as much as possible (with correct data structure design, type system, and function signature), and then complete the implementation of each component one by one with the help of AI. At this time, AI will perform better.
Issues with the Old Structure:
Excessive Scanning of
Source
:prepare
phase fully scans theSource
to obtain complete metadata (including file HASH).Dependency on Metadata from
prepare
:prepare
phase, preventing parallel execution of transmission andprepare
.Inefficient Incremental Information Generation:
Source
cannot efficiently generate its file incremental information based on its environment and product requirements.Key Improvements in the New Structure:
Two Threads in Operation:
Source
: Scans data directory information: a. Retrieves file directory structure to get a list of all files (can generate incremental information based onSource
needs). b. Compiles a list of data blocks.Target
: RequestsEngine
to fill each data unit, performing final source data reads at this stage: a. Data compression may occur at this stage. b. Generates incremental information using default methods ifSource
hasn't provided it. c. Packages source data list into data blocks ifTarget
stores in block form. d. Uploads source data in file form ifTarget
stores in directory form. e. Computes HASH during data reading for final verification. f.Target
can initiate additional upload threads for performance.Parallel Execution of
Source
andTarget
Threads:Source
thread first to obtain more task information.The Pseudo-Code is here