invalid utf-8 error - Githubissues

bdolor commented 9 years ago

I was getting an error that hung the export routine preventing me from exporting most of the books on my development instance. I was able to override it by setting $this->mpdf->ignore_invalid_utf8 = true; but that’s not a suggested fix. I just needed to get beyond the error to get to the export (i.e. doing so allowed the export routine to complete).

screen shot 2015-02-02 at 1 45 48 pm

see line 30916 in mpdf.php

Using $this->mpdf->ignore_invalid_utf8 = true; as a work around and judging where mPDF displays question marks in the PDF to replace the characters it doesn’t like, a whitespace character is the likely culprit:

screen shot 2015-02-06 at 10 47 17 am

jgraham909 commented 9 years ago

Ultimately this is a content issue and should be addressed via content. That said, we can add a setting to optionally ignore invalid utf8.

bdolor commented 9 years ago

If it were left as a content issue, we would have to tell our authors to go back into the books they've created and delete many, very specific whitespace characters which can't be detected in the WYSIWYG editor. If mPDF were configured to ignore these whitespace characters it would pepper the output with question marks.

To recreate the issue, create a sentence in WP using the WYSIWYG editor. Leave a space at the end of the sentence. Copy that sentence, including the whitespace character at the end. Paste that sentence multiple times in the text editor. Export your book with $this->mpdf->ignore_invalid_utf8 = true; Look for question marks where the whitespace character was copied and pasted.

jgraham909 commented 9 years ago

I missed your point. I thought you were advocating to use ignore_invalid_utf8. From your description this sounds like a bug somewhere in that something is not handled properly as utf8.

Following your instructions I was not able to reproduce the issue. It sounds like something is not happening correctly, is your db set to utf8 encoding? My guess is that something is handling this as iso-8859 and garbling the utf8 codepages.

jgraham909 commented 9 years ago

Re-opening issue as I don't think the setting addresses Brad's concerns here.

bdolor commented 9 years ago

I'd like to find a solution to this one. The current work around wasn't intended as a resolution. The scenarios we're presenting to the user with this workaround are an absolute freeze of the export process or a pervasive, unwanted character. I understand that you weren't able to re-create the issue from following the steps above. I have over a dozen books on my development instance that I've used numerous times to test various output functionality over the last year and a half and none of the existing books exported without being affected by either one of the scenarios.

My db is encoded with utf8.

Even if the problem turns out to be something in the set up of my local instance, there's still a chance that others will run into the same problem with existing books. At any rate, I'd like to keep this open and continue to pursue a resolution.

bdolor commented 9 years ago

I haven't had time to look into these to see if they are red herrings, but here are some other instances where non-breaking whitespace characters have been reported as problematic with mPDF:

https://groups.google.com/forum/#!msg/caucho-resin/rtIQy4CjqIM/496IJINvKNcJ http://www.mpdf1.com/forum/discussion/163/mpdf-error-html-contains-invalid-utf-8-characters/p1 https://github.com/robregonm/yii2-pdf/issues/4 https://github.com/osTicket/osTicket-1.8/issues/1395

jgraham909 commented 9 years ago

Closing this out although I really feel this is a symptom of an external problem. Something in the handling path is botching the utf8 encoding.

FunnyMonkey / pressbooks

invalid utf-8 error #1