MathNya / umya-spreadsheet

A pure rust library for reading and writing spreadsheet files
MIT License
239 stars 41 forks source link

Streaming forward-only reading API ~lazy_read for rows / cells? #116

Closed natalie-o-perret closed 10 months ago

natalie-o-perret commented 1 year ago

Hi 🙋‍♀️

I've noticed that when using lazy_read, I first need to read_sheet in order to serialize the whole worksheet (deserialize might be a tad more accurate, because by all technicalities xml => rust data structure, unless I'm not getting the rationale / intent right behind the naming)) before being able to actually read its content.

The issue is that read_sheet cannot yield cell values as a forward-only reading process, and eventually on big files with say one big / huge worksheet the burden of handling thousands of lines can take a (very) long while

fn main() {
    // reader
    let start = tic();
    let path = std::path::Path::new("C:/Users/natalie-perret/Desktop/file.xlsx");
    let mut book = reader::xlsx::lazy_read(path).unwrap();
    let sheet_count = book.get_sheet_count();
    println!("Sheet Count: {:?}", sheet_count);

    let sheet1 = book.read_sheet();
    // ...

Are there plans to support a forward-only streaming api? Akin to calamine:

fn test_calamine_lib(path: &str) {
    let mut excel: Xlsx<_> = open_workbook(path).unwrap();
    if let Some(Ok(r)) = excel.worksheet_range("Sheet1") {
        for row in r.rows() {
            println!("row={:?}, row[0]={:?}", row, row[0]);
        }
    }
}
MathNya commented 1 year ago

@natalie-o-perret Thank you for your suggestion. We had not planned on the ability to read cell values for lazy_read files, but it is probably technically possible. We will try to implement it in the next version of the software.

(deserialize might be a tad more accurate, because by all technicalities xml => rust data structure, unless I'm not getting the rationale / intent right behind the naming))

You are right, this is deserialization. I didn't pay attention to it until now. I will secretly fix it.