dhatim / fastexcel

Generate and read big Excel files quickly
Other
677 stars 122 forks source link

Problem with data type column #324

Open CamilYed opened 1 year ago

CamilYed commented 1 year ago

Hi, when I open the given file with Exit Date column, I get in processing not the date, but a value containing a number.

image image

Input excel file can be downloaded from: https://thespreadsheetguru.com/wp-content/uploads/2022/12/EmployeeSampleData.zip (https://www.thespreadsheetguru.com/sample-data/)

We can also see that there is a similar problem with the Bonus column, where we have percentages.

My code.

 @Override
    @SneakyThrows
    public void createCopy(@NonNull File input, @NonNull File output) {
        try (var inputWorkbook = new ReadableWorkbook(input)) {
            try (var fos = new FileOutputStream(output)) {
                Workbook outputWorkbook = new Workbook(fos, "Calamari", "1.0");
                Stream<Sheet> sheets = inputWorkbook.getSheets();

                sheets.forEach(sheet -> {
                    process(sheet, outputWorkbook);

                });

                inputWorkbook.close();
                outputWorkbook.close();
            }
        }
    }

    @SneakyThrows
    private void process(Sheet originalSheet, Workbook outputWorkbook) {

        var newWorksheet = outputWorkbook.newWorksheet(originalSheet.getName());
        Stream<Row> rows = originalSheet.openStream();

        rows.forEach(row -> {

            Stream<Cell> cells = row.stream();
            cells.forEach(cell -> {
                CellAddress address = cell.getAddress();
                if (address.getRow() % 2 ==0 ){
                    newWorksheet.style(address.getRow(), address.getColumn()).fillColor("FF0000").set();
                }

                switch (cell.getType()) {
                    case NUMBER -> newWorksheet.value(address.getRow(), address.getColumn(), cell.asNumber());
                    case BOOLEAN -> newWorksheet.value(address.getRow(), address.getColumn(), cell.asBoolean());
                    case STRING -> newWorksheet.value(address.getRow(), address.getColumn(), cell.asString());
                    case FORMULA -> newWorksheet.value(address.getRow(), address.getColumn(), cell.getFormula());
                    case ERROR -> newWorksheet.value(address.getRow(), address.getColumn(), cell.getFormula());
                    case EMPTY -> newWorksheet.value(address.getRow(), address.getColumn(), EMPTY);
                }
            });
        });
    }
jze commented 7 months ago

I think the problem is here: https://github.com/dhatim/fastexcel/blob/master/fastexcel-reader/src/main/java/org/dhatim/fastexcel/reader/RowSpliterator.java#L114

Cell I2 has type="n" and styleString="1". However, the list of formats for the workbook is empty. Therefore, no formatString is set and the number will not be interpreted.

This is the first content row in XML:

<row r="2" spans="1:14" x14ac:dyDescent="0.25">
  <c r="A2" t="s"><v>410</v></c>
  <c r="B2" t="s"><v>411</v></c>
  <c r="C2" t="s"><v>61</v></c>
  <c r="D2" t="s"><v>27</v></c>
  <c r="E2" t="s"><v>16</v></c>
  <c r="F2" t="s"><v>17</v></c>
  <c r="G2" t="s"><v>47</v></c>
  <c r="H2"><v>55</v></c>
  <c r="I2" s="1"><v>42468</v></c>
  <c r="J2" s="2"><v>141604</v></c>
  <c r="K2" s="3"><v>0.15</v></c>
  <c r="L2" t="s"><v>19</v></c>
  <c r="M2" t="s"><v>63</v></c>
  <c r="N2" s="1"><v>44485</v></c>
</row>

It looks as if style 1 is the date in m/d/yyyy, style 2 is the dollar amount and style 3 is the percentage value.

You can find these definitions in the styles.xml file in the <dxfs> element:

<dxfs count="5">
  <dxf>…</dxf>
  <dxf>
    <numFmt numFmtId="19" formatCode="m/d/yyyy" />
  </dxf>
  <dxf>
    <numFmt numFmtId="165" formatCode="#,##0%_);\(#,##0%\);0%_)" />
  </dxf>
  <dxf>
    <numFmt numFmtId="164" formatCode="&quot;$&quot;#,##0_);\(&quot;$&quot;#,##0\);&quot;$&quot;0_)" />
  </dxf>
  <dxf>
    <numFmt numFmtId="19" formatCode="m/d/yyyy" />
  </dxf>
</dxfs>

But apparently these style specifications are not imported.

jze commented 7 months ago

I think I have found a solution. First you have to open the file with ReadingOptions that have the parameter withCellFormat = true:

final ReadableWorkbook wb = new ReadableWorkbook(file, new ReadingOptions(true, true));

Then you can retrieve the format of a cell. A simple solutions (that only works if you know you only have dates and no time) might look like this:

if (cell.getType() == CellType.NUMBER && StringUtils.contains(cell.getDataFormatString(), "yy")) {
     final LocalDateTime date = cell.asDate();
     String value =  date.format(DateTimeFormatter.ISO_DATE);
}

Apache POI has comprehensive code in DateUtils.java to do this check properly. It might be useful to use this code for fastexcel as well.