dilshod / xlsx2csv

Convert xslx to csv, it is fast, and works for huge xlsx files
MIT License
1.66k stars 302 forks source link

Unable to handle hidden rows when attribute value of 'hidden' is 'true' in some xlsx files #266

Open rockc2020 opened 1 year ago

rockc2020 commented 1 year ago

When I run xlsx2csv against some xlsx files, found hidden rows are exported to the csv files even if the parameter skip_hidden_rows is true by default.

After digging into it a bit, I noticed those xlsx files have set the attribute value of hidden to true instead of 1 as some others. This caused the condition checking code here not filtering the hidden row out.

        elif self.in_sheet and (name == 'row' or (has_namespace and name.endswith(':row'))) and ('r' in attrs) and not (self.skip_hidden_rows and 'hidden' in attrs and attrs['hidden'] == '1'):

Here is an example xlsx file attached which is exported from Lark docs (https://www.larksuite.com/en_us/product/creation?from=hero_section) feishu_test_hidden_rows.xlsx

Here are the hidden row attributes from xlsx files exported from MS Excel and Google Docs:

row {'r': '3', 'hidden': '1'}

Here are the hidden row attributes from xlsx files exported from larksuite:

row {'customHeight': 'true', 'hidden': 'true', 'ht': '19', 'r': '2'}

Let me know if you'd like to include it. If yes, I'd be happy to raise a PR for it.