Closed fhkjaerskov closed 2 years ago
SCA used Apache POI Excel getPhysicalNumberOfCells. A new version will differentiate between 'cellsUsed' and 'physicalCellsUsed'. In a row with values in cells 1, 3 and 5, there are be 5 cellsUsed and 3 physicalCellsUsed.
At the Danish National Archives we are using the Spreadsheet complexity analyser to investigate the possible loss when converting OOXML to ODS. The way we do this is to batch convert a number of Excel files to ODS, then back to OOXML and then use the spreadsheet complexity analyser to compare the “before conversion” and “after conversion” OOXML files.
If the features that the SCA extract on both OOXML files don’t match up, we lost something.
This seems to be the case. However, we detected some big differences with regards to the number of cells used. By visual inspection alone we cannot tell the difference between the two spreadsheets (the before and after) but the SCA can.
Our main question is how the counter on used cells interpret cells with content? Is there hidden formatting that cannot be inspected visually alone?