dhatim / fastexcel

Generate and read big Excel files quickly
Other
647 stars 116 forks source link

Corrupt (but repairable) Excel file generated if Cell contents over 32K characters #318

Open urieli opened 11 months ago

urieli commented 11 months ago

I create an Excel file with a single cell, containing over 32K characters, as follows:

package org.example;

import java.io.FileOutputStream;
import java.io.OutputStream;
import org.dhatim.fastexcel.Workbook;
import org.dhatim.fastexcel.Worksheet;

public class Main {
  public static void main(String[] args) throws Exception {

    StringBuilder sb = new StringBuilder();
    for (int i=0; i<(33000/8); i++) {
      sb.append(String.format("%07d ", i));
    }
    String bigString = sb.toString();

    try (OutputStream os = new FileOutputStream("test.xlsx"); Workbook wb = new Workbook(os, "MyApplication", "1.0");) {
      Worksheet ws = wb.newWorksheet("Sheet 1");
      ws.value(0, 0, bigString);
    }
  }
}

If I open the file with Linux LibreOffice, there's no problem.

If I open it with Windows Microsoft Office, I get a message saying there's a problem with a file, and asking whether I would like to attempt to repair it. A summary of the repairs can be exported in XML format, here it is in French:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <logFileName>error078480_02.xml</logFileName>
    <summary>Des erreurs ont été détectées dans le fichier « export-PLUS_2023-08-02_125docs.xlsx »</summary>
    <additionalInfo>
        <info>Excel a terminé la validation et la réparation de niveau fichier. Certaines parties du classeur peuvent avoir été réparées ou abandonnées.</info>
    </additionalInfo>
    <repairedRecords>
        <repairedRecord>Enregistrements réparés: Propriétés de chaîne dans la partie /xl/sharedStrings.xml (Chaînes)</repairedRecord>
    </repairedRecords>
</recoveryLog>

Searching on the Internet, I'm guessing this is the equivalent of the English message "Repaired Records: String properties from /xl/sharedStrings.xml part (Strings)".

Initial file: test.xlsx

File repaired by Excel: test-fixed.xlsx

meiMingle commented 10 months ago

The reason is that the maximum number of characters in a single cell of Microsoft Office is 32767.If you must fill in such a long data in a single cell, then you cannot use Microsoft Office to open it normally. In fact, Microsoft's repair of this file was destructive, and it violently discarded the data larger than 32767 parts.

Nana-asiamah98 commented 6 months ago

I'm also facing this issue after generating the excel file. Any fix yet?

meiMingle commented 6 months ago

I'm also facing this issue after generating the excel file. Any fix yet?

I guess this is unlikely to be solved, you should probably look for other export formats, such as csv, and you will need to use an software other than Microsoft Office to open it, because as far as I know, when displaying CSV, Microsoft Office also only displays each The first 32767 characters in the cell.

Nana-asiamah98 commented 6 months ago

Well noted