PHPOffice / PhpSpreadsheet

A pure PHP library for reading and writing spreadsheet files
https://phpspreadsheet.readthedocs.io
MIT License
13.36k stars 3.47k forks source link

Attribute 'count' missing from /xl/sharedStrings.xml and there should be change of chars only for & and < while writing text to xl/sharedStrings.xml #2884

Open ppodgorskicrido opened 2 years ago

ppodgorskicrido commented 2 years ago

This is:

I report this bug, because Excel (Office 365) shows that is need to repair xlsx file.

What is the expected behavior?

In xlsx file, file xl/sharedString.xml:

What is the current behavior?

In xlsx file, xl/sharedStrings.xml file:

What are the steps to reproduce?

Xlsx file was created from database (UTF-8). Text was extracted from HTML. Text in databes contain " and ', but somehow it is converted to &quot; and &#039;.

<?php

require __DIR__ . '/vendor/autoload.php';

$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet));
$writer->save("path.xlsx");

What features do you think are causing the issue

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

It could affect xls file.

Which versions of PhpSpreadsheet and PHP are affected?

PhpSpreadsheet: 1.23 PHP: 8.1.6 Ubuntu: 20.04

ppodgorskicrido commented 2 years ago

IMO htmlspecialchars should be removed: XMLWriter

According to XML guidelines & should be escaped by using &amp; (also <), but there is no need to escape " and ' in text.

oleibman commented 2 years ago

I do not understand the problem you are reporting. When you open the file in Excel, does the string in the cell not have the correct contents? Also, what harm is caused by the missing "count" attribute?

ppodgorskicrido commented 2 years ago

The problem is that Excel won't open a xlsx file, which contains &quot; or &#039; in xl/sharedStrings.xml.

What happend in my case? My app produce xlsx file (using this library). File won't be opened by Excel, so Excel shows that file can be repaired. After the repair, changes are in xl/sharedStrings.xml as below:

- <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="7372">
+ <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="10925" uniqueCount="7372">

and a lot of changes like:

- <t xml:space="preserve">some text &quot;quotationstext&quot; more text</t>
+ <t xml:space="preserve">some text "quotationstext" more text</t>
- <t xml:space="preserve">some text qout&#039;s more text</t>
+ <t xml:space="preserve">some text qout's more text</t>

I don't know what "count" stands for and I hope you can help with that, but the problem with quotations are IMO pretty obvious.

oleibman commented 2 years ago

My version of Excel does not have any problem opening a file I created as follows (based largely on the test case in your PR), and the contents of A1 are exactly as expected:

use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$sheet->setTitle('sheet1');
$text = "Text contains: single quote: ' ampersand: & double quote:" . ' " ' . 'less than: < greater than: >';
$sheet->setCellValueByColumnAndRow(1, 1, $text);

$sheet->setCellValue('A1', $text);
$writer = new Xlsx($spreadsheet);
$filename = 'issue.2884.xlsx';
$writer->save($filename);
echo "saved $filename\n";

If the same code results in an unreadable file for you, please upload it so that I can look into how it differs from my result.

ppodgorskicrido commented 2 years ago

Ok, you're rigth. I check both on Windows and Ubuntu and generated file is correct. I will send a file soon.

ppodgorskicrido commented 2 years ago

I striped out all unnecessary rows from the excel. This one row is causing problems

checked-62b5702e20e72.xlsx

oleibman commented 2 years ago

You have a problem because there is too much data in the cell. There are limits on the maximum number of characters that a cell can contain - see https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3?ui=en-us&rs=en-us&ad=us#ID0EBABAAA=Excel_2016-2013 You have two cells with a huge amount of data, although I think only J2 exceeds the limit. If I go into the xml and delete some lines from that cell, Excel is able to read the file.

ppodgorskicrido commented 2 years ago

Ok, thank you. I will make new PR tomorrow.

oleibman commented 2 years ago

See discussion in PR #2913, which is now closed. The problem that is being reported is real, but it appears to be a problem with Excel, not PhpSpreadsheet.