RvanVeenendaal / Spreadsheet-Complexity-Analyser

This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment based on threshold values.
12 stars 0 forks source link

Question on the counter of cells used? #18

Closed fhkjaerskov closed 2 years ago

fhkjaerskov commented 4 years ago

At the Danish National Archives we are using the Spreadsheet complexity analyser to investigate the possible loss when converting OOXML to ODS. The way we do this is to batch convert a number of Excel files to ODS, then back to OOXML and then use the spreadsheet complexity analyser to compare the “before conversion” and “after conversion” OOXML files.

If the features that the SCA extract on both OOXML files don’t match up, we lost something.

This seems to be the case. However, we detected some big differences with regards to the number of cells used. By visual inspection alone we cannot tell the difference between the two spreadsheets (the before and after) but the SCA can.

Our main question is how the counter on used cells interpret cells with content? Is there hidden formatting that cannot be inspected visually alone?

RvanVeenendaal commented 2 years ago

SCA used Apache POI Excel getPhysicalNumberOfCells. A new version will differentiate between 'cellsUsed' and 'physicalCellsUsed'. In a row with values in cells 1, 3 and 5, there are be 5 cellsUsed and 3 physicalCellsUsed.