ToucanToco / fastexcel

A Python wrapper around calamine
http://fastexcel.toucantoco.dev/
MIT License
120 stars 6 forks source link

feat!: `header_row` is now the exact index of the row when loading a sheet #297

Closed PrettyWood closed 1 month ago

PrettyWood commented 1 month ago

Previously, Calamine would automatically skip initial empty lines when retrieving data. If a file started with an empty line and you wanted to set the header to the third row, you had to use header_row=2 to account for the skipped line at the start.

As of Calamine 0.26, thanks to this pull request, we have a more precise way (though still not perfect) to set the header row index to the exact row index of the sheet.

To prevent Calamine from skipping any initial empty rows, you can now use header_row=0.

Previously, when you set header_row=None, it implied "I don't want any header labels, but still skip the initial empty rows." To preserve this behavior, we now have skip_rows=None as the default. If you want to omit the header while retaining all data, including the empty rows, use header_row=None and skip_rows=0.

I also intend to enhance Calamine by directly adding support for None in the HeaderRow enum:

pub enum HeaderRow {
    /// No header
    None,
    /// First non-empty row
    FirstNonEmptyRow,
    /// Exact index of the header row
    Row(u32),
}

Additionally, I plan to introduce a skip_rows option like this:

pub enum SkipRows {
    /// Skip all empty rows at the start
    FirstEmptyRows,
    /// Skip a specified number of rows
    Number(u32),
}

fix #209 fix #233