PHPOffice / PhpSpreadsheet

A pure PHP library for reading and writing spreadsheet files
https://phpspreadsheet.readthedocs.io
MIT License
13.4k stars 3.48k forks source link

Corruption when writing an xlsx doc with conditional formatting #2035

Closed ndench closed 3 years ago

ndench commented 3 years ago

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

Xlsx documents with conditional formatting should be able to be edited using PhpSpreadsheet and not cause corrupt file errors when later opened in Excel.

What is the current behavior?

After creating a spreadsheet in Excel that contains conditional formatting, then reading and writing it with PhpSpreadsheet, an error will display next time it's opened in Excel. Excel prompts to recover the file because it found a problem in the document. After choosing to repair the document we see details that the xl/styles.xml was the culprit and it was repaired by removing formatting. Another prompt also occurs saying that the file is being edited by another user and we have to open it in read only mode.

Screenshot 2021-04-29 164228 Screenshot 2021-04-29 164320 Screenshot 2021-04-29 164335

What are the steps to reproduce?

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php

require __DIR__ . '/vendor/autoload.php';

// Create new Spreadsheet object
$spreadsheet     = PhpOffice\PhpSpreadsheet\IOFactory::load('repro.xlsx');

$writer = PhpOffice\PhpSpreadsheet\IOFactory::createWriter($spreadsheet, 'Xlsx');
$writer->save('repro-output.xlsx');

Here is the repro.xlsx file I created by opening a blank spreadsheet in Excel, and adding conditional formatting to a cell.

Here is the repro-output.xlsx file I created by running the above script.

Which versions of PhpSpreadsheet and PHP are affected?

PHP: 8.0.3 PhpSpreadsheet: 1.17.1

MarkBaker commented 3 years ago

Can you please try this with that latest master branch; that includes a number of changes to fix issues with conditional formatting; e.g. Issue #1967... and resolving that identified a couple of problems with saving Conditional Formatting rules in both the Xlsx and Xls Writers, both of which should now work correctly

ndench commented 3 years ago

Thanks for the quick response @MarkBaker. I've tried on the latest master branch, however the issue is still present.

MarkBaker commented 3 years ago

In that case, I'll investigate further using your repro file

MarkBaker commented 3 years ago

The problem isn't actually in the Writer, it's in the Reader.

The conditional style

<dxfs count="1">
    <dxf>
        <font>
            <color rgb="FF9C0006"/>
        </font>
        <fill>
            <patternFill>
                <bgColor rgb="FFFFC7CE"/>
            </patternFill>
        </fill>
    </dxf>
</dxfs>

Has a patternFill defined without a fillStyle, and the reader was setting a default fillStyle of "none". We've encountered files with an empty patternFill before, which is why we do provide a default value. However, in your case, it does also have a colour defined, and when writing the file again, it was writing the colour with a fillStyle of "none", which Excel doesn't like.

I've modified the Reader so that it now defaults to "solid" if there is a colour defined; but to "none" if there is no colour.

ndench commented 3 years ago

Were you able to make this work with my repro file @MarkBaker? I've tried dev-master again (currently 83e55cffcc162fe19fa67b15866328793b52fe7b) and I still get the same issue.

MarkBaker commented 3 years ago

Using the file that you posted, it's working for me with master branch (and I even added a unit test using your file in the patch)

ndench commented 3 years ago

Ah yes, I see the unit test you added. I double checked and verified that fix in #2050 indeed fixes a bug in the Xlsx Reader.

However I think there might be a similar related bug in the Writer? When I run the reproduction script I originally posted, repro-output.xlsx is still corrupted in the same way.

If you're able to run that script and not get corruption then maybe there's a difference in our environments that causes the issue?

I'm running PHP 8.0.3 on Ubuntu 18.04.5 with "phpoffice/phpspreadsheet": "dev-master". Then opening the resulting Excel file on Windows 10 with Excel 2016. Here's my repro-output.xlsx if that helps.

ndench commented 3 years ago

I'm not sure if the fix introduced another bug, but I've noticed that reading an xlsx file with fill colors set and writing the file results in the new file not having fill colors anymore.

I used the same script from above but with a new repro-2.xlsx. Which was created from a blank spreadsheet and putting a fill color on cell A1.

MarkBaker commented 3 years ago

I can see the problem in your repro-output.xlsx, but I can't explain it. Spot the difference in the font size:

<dxf>
    <font>
        <sz val="10"/>
        <color rgb="FF9C0006"/>
        <name val="Calibri"/>
    </font>
</dxf>

and

<dxf>
    <font>
        <sz val="0"/>
        <color rgb="FF9C0006"/>
        <name val="Calibri"/>
    </font>
</dxf>

The input file only has a font colour defined,

<dxf>
    <font>
        <color rgb="FF9C0006"/>
    </font>
</dxf>

but we always provide default values (style: Calibri and size: 10), if no value is defined in the input; so I don't understand why this is setting a size value of 0 for yourself, but a correct size value of 10 when I run it.

MarkBaker commented 3 years ago

The only significant difference between our environments is that I'm testing/debugging on PHP 7.4, while you're on PHP 8, so I'm wondering if this is one of those wonderful PHP8 issues. Our unit tests don't have explicit test cases for missing values in the XML, so I'm going to expand the assertions against the repro file that I added for the colour fill test

MarkBaker commented 3 years ago

This should fix the font size problem... the Reader now checks for font size/name explicitly, and leaves the CF values as null if they're not defined in the input style definition; and the Writer only writes those values if they're not null

ndench commented 3 years ago

That's great @MarkBaker. I can confirm that fixes the conditional formatting issue for me. Thanks for the hard work you put into this one!

I mentioned that I'm loosing fill colors when reading and writing an excel file a couple of comments ago, but I'll report that one with more information in another issue.