PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.16k stars 2.68k forks source link

MsDoc reader borks up central europen encoding #2565

Open verybigelephants opened 5 months ago

verybigelephants commented 5 months ago

Describe the Bug

Hello, when trying to read a .doc file with central european characters (file example here srncik.zip ) the reader messes up all the diacritics

Steps to Reproduce

                        $type_word_reader = IOFactory::createReader('MsDoc');
            $text = "";

            $word = $type_word_reader->load($working_file_path);
            foreach($word->getSections() as $section){
                $els = $section->getElements();
                foreach ($els as $el) {         
                    $class = get_class($el);
                    if (method_exists($class, 'getText')) {
                        //i have tried everything, nothing works
                         // \PhpOffice\PhpWord\Shared\Text::toUTF8($el->getText());
                        // \ForceUTF8\Encoding; Encoding::fixUTF8($el->getText())); 
                        $text .= $el->getText()."\n";
                    } else {
                        $text .= "\n";
                    }
                }
            }
                       file_put_contents('test.log',  $text);

Expected Behavior

not mess up the characters

Current Behavior

messing up the characters

Context

Please fill in your environment information: