Rust-Wellcome / FasMan

A re-write (+ extras) of Python scripts, used in Tree of Life, into a single Rust script.
3 stars 2 forks source link

24 implement file reader to read files chunk by chunk #25

Open dasunpubudumal opened 6 months ago

dasunpubudumal commented 6 months ago

Closes #24

Changes proposed in this pull request:

  1. File structure update

The file structure has been updated to match Rust's Modularity Guidelines. This includes 1) splitting business logic functions into a library crate, and 2) using the entrypoint of the binary crate (main.rs) only to call the run() function in library crate.

  1. Using clap's enum-based command parsing

This is a popular pattern to parse command-line arguments used in open-source projects such as Zed.

  1. Handing responsibilities of each command to a separate module.

The responsibility of handling each type of command were delegated to the processors listed in src/processors directory. The main reference for modularising the code was the official guide.

  1. Integrating itertools library's chunk function to create a file reader
pub fn read_file_by_batch(
        &mut self,
        file_path: &str,
        batch_size: usize,
        f: &dyn Fn(Records<String>),
    ) -> Result<(), FileError> {
    // ...
}

A file reader has been created to read a file chunk-by-chunk. A chunk is defined by a number of lines of a file; for example, if chunk is given to be 10, read_file_by_batch of BatchFileReader will apply function f to each chunk of 10 lines.

  1. Unit tests where the code is written

As per official Rust Guideline on Tests, unit tests for the file reader were written where the logic is.