dhatim / fastexcel

Generate and read big Excel files quickly
Other
677 stars 122 forks source link

Excessive Memory Usage During the Creation of Large Excel Documents #480

Open gamerover98 opened 1 week ago

gamerover98 commented 1 week ago

Problem

I am experiencing a memory usage issue when creating very large Excel documents using fastexcel. While inserting cells into a worksheet with maximum dimensions (16,384 columns by 1,048,576 rows), I noticed that the memory consumption was extremely high.

Here are some specific observations:

Code Example to Reproduce the Issue

package org.example;

import lombok.extern.slf4j.Slf4j;
import org.dhatim.fastexcel.Workbook;
import org.dhatim.fastexcel.Worksheet;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

@Slf4j
public class Main {

    private static final String EXCEL_RELATIVE_PATH = "./target/data/example.xlsx";
    private static final int COLUMNS = Worksheet.MAX_COLS; // 16_384 columns
    private static final int ROWS = Worksheet.MAX_ROWS;    // 1_048_576 rows

    public static void main(String... args) throws Exception {
        try (var outputStream = new FileOutputStream(getExcelFile());
             var workbook = new Workbook(outputStream, "MyApplication", "1.0");
             var worksheet = workbook.newWorksheet("Sheet 1")) {

            log.debug("Generating data...");

            for (int columnIndex = 0; columnIndex < COLUMNS; columnIndex++) {
                for (int rowIndex = 0; rowIndex < ROWS; rowIndex++) {
                    worksheet.value(rowIndex, columnIndex, "test");
                }
            }

            log.debug("Waiting...");
            // Stop current thread to see memory usage.
            Thread.sleep(Long.MAX_VALUE);
        }
    }

    public static File getExcelFile() throws IOException {
        var file = new File(EXCEL_RELATIVE_PATH);

        if (file.mkdirs()) {
            log.debug("Excel file created");
        }

        if (file.exists()) {
            Files.delete(Path.of(file.toURI()));
            log.debug("Excel file deleted");
        }

        return file;
    }
}

Proposed Solution

I propose implementing a new version of the Worksheet class that writes cells directly to the document without keeping them in memory for potential future modifications. This sequential writing method is particularly useful for those, like me, who need to create large documents "cascadingly" without needing to know the content of previously written cells.

This solution would significantly reduce memory usage for large documents, improving the library’s efficiency in similar scenarios.

Thanks a lot 💯