Open voltel opened 6 years ago
Hey @Progi1984 , any luck with this? I am seeing similar behavior. It seems unable to read a pretty standard Word97 doc, no special formatting. Instead I get broken, fragmented text and/or not getting other sections entirely.
I was able to get much better results from a simple fread style function. But that was only useful for plaintext extraction, no style or formatting data unfortunately.
function readWord($filename) {
if(file_exists($filename))
{
if(($fh = fopen($filename, 'r')) !== false )
{
$headers = fread($fh, 0xA00);
// 1 = (ord(n)*1) ; Document has from 0 to 255 characters
$n1 = ( ord($headers[0x21C]) - 1 );
// 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
$n2 = ( ( ord($headers[0x21D]) - 8 ) * 256 );
// 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
$n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );
// 1 = (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
$n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );
// Total length of text in the document
$textLength = ($n1 + $n2 + $n3 + $n4);
$extracted_plaintext = fread($fh, $textLength);
return $extracted_plaintext;
} else {
return false;
}
} else {
return false;
}
}
bad
Any updates on this? Trying to convert a .doc file to pdf, it works, but in the pdf part of the text is cut off and the italics are gone.
require 'vendor/autoload.php';
use PhpOffice\PhpWord\IOFactory;
use PhpOffice\PhpWord\Settings;
Settings::setPdfRendererName(Settings::PDF_RENDERER_DOMPDF);
Settings::setPdfRendererPath('.');
$phpWord = IOFactory::load('TEST2.doc', 'MsDoc');
$phpWord->save('word_doc.pdf', 'PDF');
the same problem, any solutions?
@BarryBravo Hi, Could you give us a sample file which you have this error, please ?
@Progi1984 Have problem with reading Wore 97-2003 format.
$pdf_uri = 'pdf.pdf';
$html_uri = 'html.html';
$word = \PhpOffice\PhpWord\IOFactory::load(storage_path($exdoc->filelocation), 'MsDoc');
$writer = \PhpOffice\PhpWord\IOFactory::createWriter($word, 'HTML');
$writer->save($html_uri);
$pdf = new Dompdf();
$pdf->loadHtml(file_get_contents($html_uri));
$pdf->setPaper('A4', 'portrait');
$pdf->render();
$output = $pdf->output();
file_put_contents($pdf_uri, $output);
After saving this document as a copy with OpenOffice I have next result:
This is:
Expected Behavior
1) The MS Word 97-2003 document (*.doc) would be correctly opened and correctly processed by
$phpWord = IOFactory::load($c_file_name, 'MsDoc'); // this line causes error
2) styles would be internally set in MsDoc.php in generatePhpWord() method:
Current Behavior
Errors, inconsistently different:
Notice: Uninitialized string offset: 327680 (or some other wildly large number) Error traced in
getInt2d()
and/orgetInt1d()
of vendor\phpoffice\phpword\src\PhpWord\Reader\MsDoc.php (line 2317)or
Fatal error: Uncaught PhpOffice\PhpWord\Exception\Exception: Could not open resources/resources/n_466.doc for reading! File does not exist, or it is not readable. in D:\xxx\xxx\vendor\phpoffice\phpword\src\PhpWord\Shared\OLERead.php:78
or
Notice: Undefined property: stdClass::$styleSection traced to vendor\phpoffice\phpword\src\PhpWord\Reader\MsDoc.php generatePhpWord()
or, when it manages to convert some test file, the layout is completely wrong: no styles, line breaks in wrong places, parts of words are missing, table is not reproduced.
the elements recognized by the following snippet are of type Text, with failed recognition of paragraphs. A simple table has not been recognized at all.
Failure Information
I tried all possible versions of MS Word 97-2003 documents (created from MS Word 2007, or in MS Word 365). I tried to process downloaded files (i.e. from here n_466.doc or d466.doc), or I created new files manually in both available to me versions of MS Word (2007 and 365) and saved them as *.doc. The provided set-up (see further) works OK with the same documents saved as .docx files (different reader class). test_documents.zip
Version, copied from the composer.json: "phpoffice/phpword": "^0.14.0",
or form composer.lock: "name": "phpoffice/phpword", "version": "v0.14.0", "source": { "type": "git", "url": "https://github.com/PHPOffice/PHPWord.git", "reference": "b614497ae6dd44280be1c2dda56772198bcd25ae" },
How to Reproduce
This is a part of Symfony 4 project.
Service class:
Controller class:
Sample implementation of twig template
Context