PHPOffice / PhpSpreadsheet

A pure PHP library for reading and writing spreadsheet files
https://phpspreadsheet.readthedocs.io
MIT License
13.33k stars 3.46k forks source link

Error in MS Excel after filling XLSX form with PhpOffice\PhpSpreadsheet #4145

Open sarunas-ven opened 2 months ago

sarunas-ven commented 2 months ago

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

Using prefilled template fill some data and write into new file. New file created from template that opens using MS Excel without errors.

What is the current behavior?

Using prefilled template fill some data and write into new file. New file created from template opens using MS Excel with error: "We found a problem with some content in 'my_file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."

What are the steps to reproduce?

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php

require __DIR__ . '/vendor/autoload.php';

// Reading the file:
$reader = new PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$reader->setReadDataOnly(false);
$spreadsheet = $reader->load('SEPA.xlsx');
$currentTab = $spreadsheet->getSheetByName('CarrierSchedule');

// Saving file:
$writer = new PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
$writer->save('my_file.xlsx');

// Closing file:
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);
unset($currentTab);

The filling cells data is skipped in the example because just opening template and saving into new file is enough to get the error.

If this is an issue with reading a specific spreadsheet file, then it may be appropriate to provide a sample file that demonstrates the problem; but please keep it as small as possible, and sanitize any confidential information before uploading.

This is the template file that is used to fill and save to new file. It is not created by me, it is the official template from government institution. SEPA.xlsx

What features do you think are causing the issue

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

It possibly affects only xlsx files.

Which versions of PhpSpreadsheet and PHP are affected?

PHP 8.2 phpoffice/phpspreadsheet: 1.29.0

oleibman commented 2 months ago

Thank you for the sample file. You have some defined names which refer to other spreadsheets on your hard drive. They are causing this problem. I am not sure how they are used. Are they needed? I am not sure offhand how PhpSpreadsheet tries to handle such names; it seems from your report that it doesn't do it correctly, but I'm not sure what "correctly" ought to be given that the referenced files do not occur on my system.

sarunas-ven commented 2 months ago

Thank you for the sample file. You have some defined names which refer to other spreadsheets on your hard drive. They are causing this problem. I am not sure how they are used. Are they needed? I am not sure offhand how PhpSpreadsheet tries to handle such names; it seems from your report that it doesn't do it correctly, but I'm not sure what "correctly" ought to be given that the referenced files do not occur on my system.

Thank you @oleibman for looking into the issue. I have updated the details. The original template file comes from government institution. So I do not know all the details about it. But it is designed to be downloaded from official website then filled manually and sent by email to that institution. We try to automate that template filling process.

agaluf commented 2 weeks ago

I've had to deal with this issue again and decided to look deeper into a possible cause. In our case the original template file comes from a major corporation and shouldn't be tampered with, only filled out. Since the templates seem to be prepared by inexperienced employees, they are... less than optimal.

When the template is loaded into PHPSpreadsheet in Version 1.29 or above, then immediately saved, it will error-out in Excel, but work fine in LibreOffice. As sticking with an old Version of PHPSpreadsheet is not an option, I took the template apart and tried reducing it to the bare minimum to try and figure out what causes this. The attached file is the result.

template.xlsx

The file has two tabs - one is hidden and contains only a selection list for our options, the other contains a form and a couple of autofilled fields. Three of those fields contain a #DIV/0 error. In the top right, we have the corporate logo. Nothing special so far.

Here's the code used to load the file and save it:

$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$spreadsheet = $reader->load(__DIR__ . '/template.xlsx');

$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
$writer->save(__DIR__ . '/broken-file.xlsx');

If you load the broken-file.xlsx in Excel, you will get an XML Error in Row 2, Column 0.

Now, if you either delete the Logo in the top right or fix the DIV/0 errors, the file will no longer error out. I can understand DIV/0 errors, but Logo? Do note that in our case, we actually shouldn't be doing either, as the file should not be tampered with.

Something in Release 1.29 must have made PHPSpreadsheet substantially more brittle with suboptimal templates.