Open vaites opened 2 years ago
Maybe your file is created by wpf or other office softwares? Try to use export "ms excel" type from other office softwares.
The exported file can be opened without problems with any Excel version, so I assume this library must open it also without problems.
Anyway, the attached example is a XLS file created with Excel, and it fails.
I have a similar issue, I am able to load between 10K and 20K rows but not more. i tried chunked row filter, it worked out with CSV files, but the Xlsx reader loads the first chunk and then stops.
Can you explain how you are using the filter? I created a filter but didn't worked for me. Whole spreadsheet is readed...
I start reading chunks from row 2, so the first chunk goes as it is, starting from the second chunk we splice the array to remove the rows read in the first chunk, they all will be null (in the chunkfilter class, the rows start from 1 , the first raw is always read, you will find many examples)
// create reader
...whatever you need
$reader->setLoadSheetsOnly($worksheetname);
$chunkFilter = new \PhpOffice\PhpSpreadsheet\Reader\ChunkReadFilter();
$reader->setReadFilter($chunkFilter);
/ Loop to read our worksheet in "chunk size" blocks /
for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize) {
/ Tell the Read Filter which rows we want this iteration /
$chunkFilter->setRows($startRow,$chunkSize);
echo "Loading rows $startRow -
";
$spreadsheet = $reader->load($file);
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, false, false, true);
if($startRow>2){ array_splice( $sheetData ,1,$startRow-3);}
Hi have same issue. I've build 2 functionallity, one for export some data and another for import this data. If I remove one line from the file and try to import, when I call to $worksheet->getActiveSheet()->toArray() I get the same error and not make sense for me because I have only 5 rows, not a big large file. It's a bit weird this error because if I edit the file adding some values, or adding new rows and import, everything work fine, the error olny happen when I remove rows. Any ideas? Example files Working --> medidas_work.xlsx Not Working --> medidas.xlsx
@jkdc
Ok, looking at the medidas.xlsx file that doesn't work: starting with Worksheet #2 (Medidas
), contains references to
<dimension ref="A1:R1048576"/>
<row r="1048576" outlineLevel="0" hidden="false" collapsed="false" customHeight="false" ht="12.8" customFormat="false"/>
So while the worksheet may only contain 5 rows of data, it claims to have 1048576 rows, and has styling information for that final row.This tells PhpSpreadsheet that when you call toArray()
, then it should create an array matching the specified dimensions (A1:R1048576
), so yes, that will exhaust your memory.
Rather than using toArray()
, you could consider processing the data using a row iterator instead, which is a lot more memory efficient than trying to create a giant array to iterate over. If you still prefer to extract the data from the worksheet to an array (and arrays are always memory-heavy in PHP) then work out the maximum row and column that actually contain cell data with calls to getHighestDataColumn()
and getHighestDataRow()
to identify the data range, the worksheet object even has a handy method called calculateWorksheetDataDimension()
to do that for you, and then use the rangeToArray()
method instead.
Using the following code to load the file and copy the data into a PHP array:
$inputFileType = 'Xlsx';
$inputFileName = __DIR__ . '/../medidas.xlsx';
$callStartTime = microtime(true);
$reader = IOFactory::createReader($inputFileType);
$spreadsheet = $reader->load($inputFileName);
$callEndTime = microtime(true);
$loadCallTime = $callEndTime - $callStartTime;
echo PHP_EOL;
echo 'Call time to load spreadsheet file was ' , sprintf('%.4f', $loadCallTime) , ' seconds' , PHP_EOL;
$worksheet = $spreadsheet->getActiveSheet();
echo "Worksheet Name: {$worksheet->getTitle()}", PHP_EOL;
echo "Range Dimensions specified in the xlsx file {$worksheet->calculateWorksheetDimension()} that will be used by the toArray() method", PHP_EOL;
echo "Range of cells that contain actual data {$worksheet->calculateWorksheetDataDimension()} that can be passed to the rangeToArray() method", PHP_EOL;
$data = $worksheet->rangeToArray($worksheet->calculateWorksheetDataDimension(), null, true, true, true);
// Echo memory usage
echo ' Current memory usage: ' , (memory_get_usage(true) / 1024) , ' KB' , PHP_EOL;
echo ' Peak memory usage: ' , (memory_get_peak_usage(true) / 1024) , ' KB' , PHP_EOL;
The result:
Worksheet Name: Medidas
Range Dimensions specified in the xlsx file A1:R1048576 that will be used by the toArray() method
Range of cells that contain actual data A1:R5 that can be passed to the rangeToArray() method
Call time to load spreadsheet file was 0.0855 seconds
Current memory usage: 6144 KB
Peak memory usage: 6144 KB
@MarkBaker thanks a lot!! I've tested and like you say is working better using rangeToArray with calculateWorksheetDataDimension. and thanks again for your fast answer.
In my case i had no issues with data range detection,
range detection takes an extra ,01 sec or so.
see full code for preview of 500 rows of data
$inputFileType=''; // default
$rowsPreview=500;
$chunkSize = $rowsPreview;
$startRow=0;
$tstart=microtime(true);
$worksheetname='';
if(file_exists($file)){
echo "date modified: " . date ("F d Y H:i:s.", filemtime($file)). '</br>';
$inputFileType = \PhpOffice\PhpSpreadsheet\IOFactory::identify($file);
echo '<p>Detected file type is <kbd>'.$inputFileType.'</kbd></p>';
if ($inputFileType=='Xlsx' || $inputFileType=='Xls') {
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
//$reader->setReadDataOnly(true);
$reader->getReadEmptyCells(false);
}
else if ($inputFileType=='Csv'){
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
}
else {
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType); //$spreadsheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($file);
}
$s=0;
//$reader->setReadDataOnly(true);
$reader->getReadEmptyCells(false);
echo 'File data information<br>';
$worksheetData = $reader->listWorksheetInfo($file);
foreach ($worksheetData as $worksheet) {
echo "Tab $s - <kbd>".$worksheet['worksheetName']. '</kbd> Rows: <kbd>', $worksheet['totalRows'],
'</kbd> Columns: <kbd>', $worksheet['totalColumns'], '</kbd> Cell Range: <kbd>A1:', $worksheet['lastColumnLetter'], $worksheet['totalRows'].'</kbd>'. ($s==$tab?'<- active':'') .'<br>';
if($s==$tab){$worksheetname=$worksheet['worksheetName'];}
$s++;
}
$reader->setLoadSheetsOnly($worksheetname);
/** Instantiate a new Spreadsheet object manually **/
//$spreadsheet = new \PhpOffice\PhpSpreadsheet\Spreadsheet();
$chunkFilter = new \PhpOffice\PhpSpreadsheet\Reader\ChunkReadFilter();
$reader->setReadFilter($chunkFilter);//->setContiguous(true);
$chunkFilter->setRows($startRow,$chunkSize);
//$spreadsheet = $reader->loadIntoExisting($file,$spreadsheet);
$spreadsheet = $reader->load($file);
if(is_object($spreadsheet)){
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, false, true, true);
//$sheetData = $spreadsheet->getActiveSheet()->rangeToArray($spreadsheet->getActiveSheet()->calculateWorksheetDataDimension(),null, true, true, true); // not working, leads to timeout
$count = count($sheetData) - 1;
echo "<p> Preview <kbd>$worksheetname</kbd> sheet. A number of lines: <kbd> $rowsPreview </kbd></p>";
echo '</div>';
$valuesData='';
$recordCount=0;
echo '<table class="table table-striped table-sm table-hover table-responsive table-bordered" >';
echo '<thead class="thead-light">'; // a header
$counter=0;
foreach($sheetData as $k => $v) {
$counter=$counter+1;
if ($counter>$rowsPreview) {break 1;}
if($k == 1) {// table data
echo '<tr>';
echo '<th>row</th>';
foreach($v as $key => $value) {
echo '<th>' . $key . '</th>';
}
echo '</tr>';
}
echo '<tr><td>'.$k.'</td>';
foreach($v as $key => $value) {
if($k > 1) {
echo '<td>' . $value . '</td>';
}
else {
echo '<th>' . $value . '</th>';
}
}
echo '</tr>';
if($k == 1) {// table data
echo '</thead>';
}
} // end for each row
echo '</table>';
}// sheet is object
else {echo 'Failed reading';}
$tend=microtime(true);
echo 'It took '.round($tend - $tstart,2).' seconds <br>';
Whether you're using toArray()
or rangeToArray()
, pulling the spreadsheet data into an array will still take a significant amount of memory; arrays in PHP are big memory consumers by their nature. And as a general warning, calculateWorksheetDataDimension()
still assumes that data starts at cell A1
: if you've been chunk-reading, and (for example) loaded the chunk of data between A500 and Z1000, then calculateWorksheetDataDimension()
will report a range of A1:Z1000
rather than A500:Z1000
, so rangeToArray
will still generate a large array with 499 empty rows at the beginning.
If you want to iterate over the rows and columns of worksheet data, and are hitting memory limits doing so, then I'd recommend using iterators rather than pulling it all into an array; and the row/column iterators allow you to specify start and end row/column.
The conversation has led to a different problem than the one originally reported. I'm reporting an error when calling load()
and memory is exhausted BEFORE calling toArray()
.
@vaites
And I can't figure out any way of fixing that easily: when the files tells one thing, such as the shrinkRangeToFit
returns E9:GU5420
, then should PhpSpreadsheet believe it or not?
@MarkBaker, of course the library should trust shrinkRangeToFit
, but I think the problem I reported is not related with the spreadsheet size because all works perfectly if I replace strlen($this->data)
with mb_strlen($this->data)
in the Xls reader. Why? Is there anything wrong with my file encoding? Is this not a bug?. I solved the problem temporarily copying the Xls reader class to my project and making these changes but this is not a valid long-term solution...
There is anything more I can do to provide more information?
There's no logical reason why using mb_strlen($this->data)
instead of strlen($this->data)
should work, and every reason to believe that it wold create errors where none existed. The BIFF format used for Xls files is a binary stream, so we need to know how many data bytes we're looking at, not characters in a string
It doesn't make sense to me about mb_strlen
either but the fact is that it works with that change.
Let me evaluate the problem further to try to determine the exact point where it fails and if I can suggest any changes.
I know that I could do a refactoring of how we store the data validations, so that it's stored it at the worksheet level rather than for each individual cell; but that would be a major bc break, and would entail changes to every Reader and Writer for formats that supports data validation.
This also caused issues for me, with this simple excel file: Bad File.xlsx The entire file is formatted in Arial/10 and for each row you add it seems to get exponentially worse. Just trying to import 20 rows from this file was 5-10 seconds.
@manstie
$inputFileType = 'Xlsx';
$inputFileName = __DIR__ . '/../Bad.File.xlsx';
$callStartTime = microtime(true);
$reader = IOFactory::createReader($inputFileType);
$spreadsheet = $reader->load($inputFileName);
$callEndTime = microtime(true);
$loadCallTime = $callEndTime - $callStartTime;
echo PHP_EOL;
echo 'Call time to load spreadsheet file was ' , sprintf('%.4f', $loadCallTime) , ' seconds' , PHP_EOL;
echo ' Current memory usage: ' , (memory_get_usage(true) / 1024) , ' KB' , PHP_EOL;
$worksheet = $spreadsheet->getActiveSheet();
echo "Worksheet Name: {$worksheet->getTitle()}", PHP_EOL;
echo "Range Dimensions specified in the xlsx file {$worksheet->calculateWorksheetDimension()} that will be used by the toArray() method", PHP_EOL;
echo "Range of cells that contain actual data {$worksheet->calculateWorksheetDataDimension()} that can be passed to the rangeToArray() method", PHP_EOL;
$data = $worksheet->rangeToArray($worksheet->calculateWorksheetDataDimension(), null, true, true, true);
// Echo memory usage
echo ' Current memory usage: ' , (memory_get_usage(true) / 1024) , ' KB' , PHP_EOL;
echo ' Peak memory usage: ' , (memory_get_peak_usage(true) / 1024) , ' KB' , PHP_EOL;
Result:
Call time to load spreadsheet file was 0.1712 seconds
Current memory usage: 6144 KB
Worksheet Name: Upload
Range Dimensions specified in the xlsx file A1:WVI201 that will be used by the toArray() method
Range of cells that contain actual data A1:B201 that can be passed to the rangeToArray() method
Current memory usage: 6144 KB
Peak memory usage: 6144 KB
I don't know what you're doing to "import" rows from the file because you don't explain; but check the actual data size determined by $worksheet->calculateWorksheetDataDimension()
, not the reported data size returned by $worksheet->calculateWorksheetDimension()
.
I am using the Laravel-Excel extension which employs PhpSpreadsheet, so given your example above I assume they just use the toArray
method - I'll pass this onto them. I am curious as to where A1:WVI201
comes from, as if you search the XML there is only <dimension ref="A1:B201"/>
.
Thanks.
But there are column styles defined up to column WVI
<col customWidth="1" bestFit="1" width="15.88671875" max="16129" min="16129"/>
The code that populates the stored value retrieved by a call to calculateWorksheetDimension()
looks at all column/row and cell references in the file to identify the max column and row, that included column dimensions that are defined in the file
For most spreadsheet files, using this value isn't a problem; the value is loaded with the file load, and stored in the Worksheet object, and is generally accurate enough (it might be inaccurate as a measure of cells containing data by a few rows or columns, but rarely enough to create any problem)... files where the value is extremely incorrect and high are the exception. It doesn't take "chunking" into account, and It isn't changed if you add/delete rows/columns either.
The alternative calculateWorksheetDataDimension()
calculates based on actual cell data at the point when the method is called, so it's a lot slower compared with calculateWorksheetDimension()
(though I've boosted the method speed in the latest code), It ignores row/column dimensions, but checks whether cell objects exist.... of course, there may be a lot of empty cell objects as well, so it doesn't take that into account, and different developers also have different definitions of what empty means, but I've also added methods to identify empty rows/columns for the next release.
That is abnormal, I am importing files with 20K plus rows in 10 sec.
You can Send me your file I ll test on my setup.
On Mon, 20 Jun 2022 at 05:54, Max @.***> wrote:
This also caused issues for me, with this simple excel file: Bad File.xlsx https://github.com/PHPOffice/PhpSpreadsheet/files/8937383/Bad.File.xlsx The entire file is formatted in Arial/10 and for each row you add it seems to get exponentially worse. Just trying to import 20 rows from this file was 5-10 seconds.
— Reply to this email directly, view it on GitHub https://github.com/PHPOffice/PhpSpreadsheet/issues/2781#issuecomment-1159938828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7OOBRJ2D72YQGRNHTLIOLVP7TPJANCNFSM5ULV2K4A . You are receiving this because you commented.Message ID: @.***>
-- Met vriendelijke groet, Maxim
+31 6228 015 28 | @.*** | Gele Rijdersplein 17E - 2R | 6811 AP Arnhem | The Netherlands
... The alternative
calculateWorksheetDataDimension()
calculates based on actual cell data at the point when the method is called, so it's a lot slower compared withcalculateWorksheetDimension()
Is dimension ref
not an accurate representation of the dimensions of the data?
That is abnormal, I am importing files with 20K plus rows in 10 sec. You can Send me your file I ll test on my setup.
Don't worry @maximnl, @MarkBaker has explained everything.
Is
dimension ref
not an accurate representation of the dimensions of the data?
It should be, but isn't. The problem file in this thread posted by jkdc had a <dimension ref="A1:R1048576"/>
for a file containing 5 rows of data; or for files generated by other scripts, the dimension ref may not even be set; There may also be additional information to load outside the defined area (print settings, style settings, conditional formatting, data validations, etc) if the file is being used as a template to populate data before saving it; and this will also affect the cached range value.
I have a similar issue. The culprit is the auto filter. I have a spreadsheet with ~10 worksheets each of them ~1000 rows.
With autofilter the memory usage is ~2000MB, without ~40MB. I do not have yet enough internal knowledge of this project to find the reason why the memory is not released after the filter evaluation.
I have a similar issue. The culprit is the auto filter. I have a spreadsheet with ~10 worksheets each of them ~1000 rows.
With autofilter the memory usage is ~2000MB, without ~40MB. I do not have yet enough internal knowledge of this project to find the reason why the memory is not released after the filter evaluation.
If you're doing an autofilter evaluation, and memory isn't being released, then that's a very specific issue. Can you provide an example spreadsheet that demonstrates the problem for testing?
i did investigate a little more.
the issue is that my range is defined as A1:A:100000
, that forces the filter to iterate over all the rows in the range and see if needs to hide them ($this->workSheet->getRowDimension((int) $row)->setVisible($result);
)
that allocates a RowDimension
object for each row for each sheet, and boom, a lot of memory wasted
\PhpOffice\PhpSpreadsheet\Worksheet\AutoFilter
line 1053.
Calculating the highest row solves the issue.
I think that it should be mentioned somewhere (docs?) that AutoFilter
takes a ton of memory, and that it is recommended to keep the filter range the smallest possible.
I've read better the description for this ticket and seems something different, however I have created the fix for the issue I've encountered https://github.com/PHPOffice/PhpSpreadsheet/pull/3527
This is:
I'm working with a file generated by another software that I need to fill: read the file, fill some cells and save it again. I use a reader to get the
Spreadsheet
instance (so I can set the desired cell values) and a writer to save it.What is the expected behavior?
Excel file loaded without issues
What is the current behavior?
PHP runs ou of memory
What are the steps to reproduce?
Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
After digging into the reader code I found that the range cell is huge (function
shrinkRangeToFit
returns E9:GU5420) so the reader thinks the file big (hundred/thousands of empty cells) but the error can be avoided if:$reader->setReadDataOnly(true)
is used (need the formatting, so no valid for me)strlen($this->data)
is replaced withmb_strlen($this->data)
onsrc/PhpSpreadsheet/Reader/Xls.php
(don't understand why)I tried this, without success:
$reader->setReadEmptyCells(false)
(no effect)I even but the
What features do you think are causing the issue
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
Tested with XLS and XLSX (attached file)
Which versions of PhpSpreadsheet and PHP are affected?
As far as I know, all of v1 versions (tried with 1.21, 1.22 and 1.23). I tried both on Windows 10 and Ubuntu 20.04 LTS using PHP 8.1.