Ways to improve speed of reading and writing?

mkondratyev85 commented 7 months ago

Hello @gadomski and thanks for the wonderful library!

I'm trying to use it for retiling purposes and it seems like it processes files slower than I expected. And i'm not exactly sure what I'm doing wrong.

use las::{Read, Reader, Write, Writer};
use pyo3::prelude::*;

#[pyfunction]
fn merge_laz_files(
    rectangle: (f64, f64, f64, f64),
    input_paths: Vec<String>,
    output_path: String,
) -> PyResult<()> {
    let (x_min, y_min, x_max, y_max) = rectangle;

    let reader = Reader::from_path(&input_paths[0]).unwrap();

    let header_option = Some(reader.header().clone());

    let mut writer = Writer::from_path(&output_path, header_option.unwrap()).unwrap();

    for path in input_paths.iter() {
        let mut reader = Reader::from_path(path).unwrap();

        reader
            .points()
            .filter_map(Result::ok)
            .filter(|p| p.x >= x_min && p.x < x_max && p.y >= y_min && p.y < y_max)
            .for_each(|point| {
                let _ = writer.write(point);
            });
    }

    Ok(())
}

/// A Python module implemented in Rust.
#[pymodule]
fn lazrs_merge(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(merge_laz_files, m)?)?;
    Ok(())
}

All it does is it reads a sequence of files, iterates over points and writes points into the output file if a point lies within a certain rectangle. And this function works fine. But it works ~10 times slower than a function from lidR package for R language. And I don't understand why the difference is so large and what I'm doing wrong.

I tried to read the raw points only (without re scaling them), but even only reading the points is slower that my initial implementation.

use las::{raw, Read, Reader, Write, Writer};
use pyo3::prelude::*;
use std::fs::File;
use std::io::{BufReader, Error as IOError, Seek, SeekFrom};

#[pyfunction]
fn merge_laz_files(
    rectangle: (f64, f64, f64, f64),
    input_paths: Vec<String>,
    output_path: String,
) -> PyResult<()> {
    let (x_min, y_min, x_max, y_max) = rectangle;

    let reader = Reader::from_path(&input_paths[0]).unwrap();

    let header_option = Some(reader.header().clone());
    let header_ = header_option.clone().unwrap();

    let mut writer = Writer::from_path(&output_path, header_option.unwrap()).unwrap();

    for path in input_paths.iter() {
        let mut file = File::open(path).unwrap();

        let raw_header = raw::Header::read_from(&mut file).unwrap();

        file.seek(SeekFrom::Start(raw_header.offset_to_point_data.into()))
            .unwrap();

        let large = raw_header.large_file.unwrap();
        let n_records = large.number_of_point_records;

        for _ in 0..n_records {
            let point = raw::Point::read_from(&mut file, &header_.point_format());
        }
    }

    Ok(())
}

/// A Python module implemented in Rust.
#[pymodule]
fn lazrs_merge(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(merge_laz_files, m)?)?;
    Ok(())
}

I'm clearly doing something wrong, but I dont see what exactly. Could you please clarify this to me?

gadomski commented 7 months ago

Two things to check initially:

~Use a BufReader and BufWriter so you're not doing IO for every point (https://docs.rs/las/latest/las/#prefer-bufread)~ I see you're using from_path so this should be taken care of for you.
Make sure you're building/running in release mode

If those don't help, lmk and I can try to dig in further.

mkondratyev85 commented 7 months ago

Thanks for a quick reply!

I checked if I was running in release mode - and I wasn't. Thanks for telling me this. I feel stupid. Now this runs faster, but still not as fast as lidR.

My function that i'm calling from python after the release was build using the pyo3 finishes the processing of 4 tiles in 0.38 seconds.

The lidR function that i'm calling from R that I'm calling from python using r2 finished the processing of the same tiles in 0.25. And this all with the overhead of calling R from inside python.

Do you have any ideas for why this is still a bit slower??

gadomski commented 7 months ago

I don't have any other specific suggestions, sorry. Optimization can be tricky. For your awareness, this library wasn't developed with any specific production use-case in mind, so hasn't (to my knowledge) been vetted in any sort of high performance scenaior.

mkondratyev85 commented 7 months ago

I understand that. And the las-rs library is very useful. I just don't understand where the bottleneck could be? Does it make sense to try reading raw points instead of iterating over points?

tmontaigu commented 7 months ago

If you are reading LAZ files then it could be made faster if las-rs allowed chunked/batched-read/writes to allow laz-rs using multiple threads, a bit like https://github.com/gadomski/las-rs/issues/64 mention

If it's LAS file, chunked read/write could help but not that much I believe

Also since it seems you are trying to expose these function to python, you might as well try laspy which supports batch read/writes

mkondratyev85 commented 7 months ago

Hi @tmontaigu, and thank you for your response. I understand that performing these operations in parallel could potentially optimize their speed. However, my current focus is on comparing the performance of these processes when executed as a single thread in both lidR and las-rs. And because the lidR is ~1.5 times faster than the las-rs I was wandering if there are some clever tricks that could be used for my task.

tmontaigu commented 7 months ago

Are these LAS or LAZ files ?

mkondratyev85 commented 7 months ago

I'm using the LAS files for comparision. I tried to use the LAZ files as well. For LAZ files the difference in speed is less pronounced, but the lidR function is still faster.

(Just in case I will mention that I used laz feature included in the las).

tmontaigu commented 7 months ago

Ok so its means its not related to laz decompression,i'll try to do some profiling to see if there is anything obviously slow that emerges from it

mkondratyev85 commented 7 months ago

I think I know whats going on. If I process only single file - the rust version is marginally faster than the lidR one. But when I increase the number of input files - the time for the rust function grows linearly, and the time for the lidR function grows logarithmically. Apparently, the lidR function first does the indexing. I was measuring the time only needed for the actual clipping because I tried to make the comparison fare. But before the clipping using the lidR I had to do the initialization of the catalog with the las files. And if I measure the time needed for both initialization and clipping - this is always slower than the pure rust implementation.

Sorry for taking your time, guys, and thanks you for helping me with this.

gadomski / las-rs

Ways to improve speed of reading and writing? #65