PHPOffice / PHPExcel

ARCHIVED
Other
11.46k stars 4.2k forks source link

All CSV files that contain "<" symbol in the first 2048 bytes will be incorrectly identified as an HTML file #1332

Closed Rikanishu closed 6 years ago

Rikanishu commented 6 years ago

The issue is related to code of that function:

https://github.com/PHPOffice/PHPExcel/blob/185bf412339526c97a98d959c3c0ecf4b7c80016/Classes/PHPExcel/Reader/HTML.php#L131-L141

If your CSV file just contains "<" symbol in any position of the first 2048 bytes it will be incorrectly identified as an HTML file if you're using PHPExcel_IOFactory::identify() call to determine a reader for needed files. After returning the reader any attempts to load the file will fail with the following error message (obviously, because it is not an HTML document):

DOMDocument::loadHTML(): htmlParseStartTag: invalid element name in Entity, line: 59 in phpoffice/phpexcel/Classes/PHPExcel/Reader/HTML.php:478

I'd recommend to use another way of checking to make sure it's really HTML file, or at least allow user to override PHPExcel_IOFactory::$_autoResolveClasses so it would be possible to remove HTML from there.

Thanks.

PowerKiKi commented 6 years ago

Was fixed in https://github.com/PHPOffice/PhpSpreadsheet/commit/408da0c17a7c198a2f0de44450902c711b9ef4c6