T0ha / ezodf

ezodf is a Python package to create new or open existing OpenDocument (ODF) files to extract, add, modify or delete document data, forked from dead project https://bitbucket.org/mozman/ezodf
Other
61 stars 23 forks source link

ODS sheets: Wrong results with sparse tables #12

Open hvbtup opened 8 years ago

hvbtup commented 8 years ago

At least the following function return wrong results when I create a sheet that contains data only in the cells (for example) G1, Z1, CD1, CE1: sheet.ncols() sheet["CD1"].value and many more. It seems as if the table:number-columns-repeated attribute is simply ignored.

hvbtup commented 8 years ago

I think the reason is the logic in tablenormalizer.py, class _ExpandAllLessMaxCount, method expand_cell. The elif condition seems wrong. OTOH the class name says exactly what is happening. This makes me think that using this class is wrong. In my case, maxcols is less than the table:number-columns-repeated attribute.

I could work around this problem by calling

ezodf.conf.config.set_table_expand_strategy('all_less_maxcount', (100, 100))

Anyway, if the elif clause in expand_cell is called, the xml attribute is removed (and information gets lost) while the cell is treated as repeating-1. This in turn causes the wrong result for ncols(), accessing columns and so on. And then it's not even possible to work around this by directly looking at the XML node (as I tried), because the table:number-columns-repeated attribute has been removed silently.

I propose to raise an Exception in this case or to mark the sheet and the row as corrupted (raising an exception if accessing cells by position is tried). And the XML attribute should not be removed in this case, allowing the developer to examine it himself.

matkoniecz commented 8 years ago

@T0ha Are you planning on doing anything with that? I just encountered this bug and it is especially bad as it silently gives completely wrong data.

I almost failed to notice it.

At least pyexcel-ods and odfpy clearly crashed, silently giving bad data is much worse.

sirex commented 7 years ago

Not sure, but maybe this is related: https://github.com/frictionlessdata/tabulator-py/pull/114/files#r85104563

I got too many extra data filled with None values.

sirex commented 7 years ago

Sorry, issue I posted few hours ago is not related. It turned out, that if cells have a formatting applied, they are considered non-empty, even if they don't contain any data. When I created completely new document and pasted to it only part containing data, issue gone.