aVadim483 / fast-excel-writer

Lightweight and very fast XLSX Excel Spreadsheet Writer in PHP
MIT License
142 stars 26 forks source link

Memory usage for 100K Rows is >1.5GB #54

Closed JuniorRaja closed 2 months ago

JuniorRaja commented 3 months ago

Hi,

I have been trying to write an .xlsx file from a really large dataset of 350K rows & 150 Columns. How would you approach writing such a large file. Pl find the below dry run results and the code.

For 10K Records elapsed time: 10.374 sec Memory peak usage: 76 MB

For 50K Records Elapsed time: 56.563 sec Memory peak usage: 364 MB

For 75K Records Elapsed time: 97.748 sec Memory peak usage: 546 MB

<?php

require_once __DIR__ . '/vendor/autoload.php';
require_once __DIR__ . '/src/autoload.php';

use \avadim\FastExcelWriter\Excel;

function generateXlsx($savePath, $filename, $jsonFilePath) {
    if ($jsonFilePath === null) { die("Error: File path is null."); }

    $jsonData = file_get_contents($jsonFilePath);
    $data = json_decode($jsonData, true);
    if ($data === null) { die("Error: Failed to decode JSON from text file."); }

    $header = [
        'c1-text'   => '@', //text
        'c2-text'   => '@text', //text
        'c3-integer' => '@integer',
        'c4-integer' => ['text-color' => '#f0f'], // default format
        'c5-money'  => '@money',
        'c6-price'  => '#\'##0.000', //custom numeric format
        'c7-date'   => ['format' => '@date', 'width' => 'auto'],
        'c8-date'   => ['format' => 'YYYY-MM-DD', 'width' => 11],
        'c9-time'   => 'H:MM',
    ];

    $outFileName = $savePath . '/' . $filename;

    $timer = microtime(true);

    $excel = Excel::create();
    $sheet = $excel->getSheet();

    $rowOptions = ['font-style' => 'bold'];
    $sheet->writeHeader($header, $rowOptions);

    foreach($data as $row) { $sheet->writeRow($row); }

    $excel->save($outFileName);

    echo '<b>', basename(__FILE__, '.php'), "</b><br>\n<br>\n";
    echo 'out filename: ', $outFileName, "<br>\n";
    echo 'elapsed time: ', round(microtime(true) - $timer, 3), ' sec', "<br>\n";
    echo 'Memory peak usage: ', memory_get_peak_usage(true) / (1024 * 1024), ' MB<br>';
}

// Check if the script is invoked with the required parameters
if(isset($argv) && count($argv) >= 4) {
    $savePath = $argv[1];
    $filename = $argv[2];
    $jsonFilePath = $argv[3];

    // Generate the XLSX file
    generateXlsx($savePath, $filename, $jsonFilePath);
} else {
    echo "Usage: php script.php <savePath> <filename> <json_encoded_rows>\n";
}
?>

` image

aVadim483 commented 3 months ago

In this code, the memory_get_peak_usage() function shows the memory usage of more than just the library. You load a large json file, then call json_decode(), and these actions consume a lot of memory, much more than the FastExcelWriter library.

In the code below you can change the $maxRow variable and measure memory consumption. And you will see that it will be approximately 4Mb even with large $maxRow values

<?php

require_once __DIR__ . '/vendor/autoload.php';
require_once __DIR__ . '/src/autoload.php';

$maxCol = 150;
$maxRow = 100000;

$rowData = array_fill(0, $maxCol, 'text');;

$outFileName = __DIR__ . '/' . $maxCol . '-' . (int)($maxRow / 1000) . 'k.xlsx';

$timer = microtime(true);

$excel = \avadim\FastExcelWriter\Excel::create(['Sheet1']);
$sheet = $excel->sheet();

for($rowNum = 1; $rowNum <= $maxRow; $rowNum++) {
    $sheet->writeRow($rowData);
}
$excel->save($outFileName);

$time = round(microtime(true) - $timer, 3);
$memory = round(memory_get_peak_usage(true) / (1024 * 1024), 2);

echo '<b>', basename(__FILE__, '.php'), "</b><br>\n<br>\n";
echo 'out filename: ', $outFileName, "<br>\n";
echo 'elapsed time: ', $time, ' sec', "<br>\n";
echo 'memory peak usage: ', $memory, "Mb<br>\n";
echo 'total rows: ', $maxRow, "<br>\n";
echo 'speed: ', round($maxRow / $time, 3), " rows/sec<br>\n";