Closed holtzermann17 closed 11 years ago
Oh, of course, if you do try to reproduce this by creating a Basic Page, you should use the Full HTML text format.
I certainly want to help solve the problem, but it is hard to see it as a LaTeXML issue, as such, unless it is producing somehow invalid UTF8, or the output doesn't have the right declarations of utf-ness in it, or something similar.
Using the above procedure with Basic Pages, I can narrow the Minimal (not) Working Example down to this:
a measure space $(X,\mathfrak{B},\mu)$,..
or indeed, this:
$\mathfrak{B}$
With an old LaTeXML version, I get back something like:
<math alttext="\mathfrak{B}" display="inline"><semantics>
<mi mathvariant="fraktur">B</mi>
<annotation-xml encoding="MathML-Content">
<ci xmlns="http://www.w3.org/1998/Math/MathML">B</ci>
</annotation-xml><annotation encoding="application/x-tex">
\mathfrak{B}</annotation></semantics></math>
This renders, but without the Fraktur font: http://beta.planetmath.org/testencoding
Whereas with a newer LaTeXML, I get back some XHTML that includes the unicode character "U+1d505".
This suggests that it is actually a certain subset of (like "U+1d505") that are causing trouble for PHP/PDO. Presumably it should work, in other words, my database is still set up wrong, but at least I've traced it down to something minimal.
So you're saying that LaTeXML produces valid UTF-8, but that chokes MySQL? That makes sense and won't be the first time we hit that problem.
AH! Now I see what's going on. Yes, LaTeXML changed to produce Plane 1 characters
for styled math symbols by default, as opposed to optionally. You can turn that off using the ---noplane1
.
But I'd recommend you only turn it off for testing; You kinda want the new default behaviour, since otherwise browsers without enough of the right fonts will show a plain "B" instead of the fraktur B. Or test and see which way you prefer.
Plane1 is > 16bit, so more likely to stress some applications.
Yes... it seems like Drupal/PDO/MySQL is having trouble with this new Plane 1 UTF-8.
I posted a question on the Drupal !StackExchange site, which takes LaTeXML out of the loop for now, because we're down to a one-character MWE.
http://drupal.stackexchange.com/questions/50868/configuring-drupal-to-use-unicode-characters
Maybe someone there will know what to do next!
But, actually it seems that now that I know the right search terms, the answer might be here:
''MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4.''
... In which case, this issue could potentially end up affecting a lot of LaTeXML users, so I think it's a good thing I've been discussing with you.
Joe, is this ready to be closed?
I think we've established the core issue regarded MySQL and UTF-8 and we'll keep in mind relaying that information to any other derivative applications around LaTeXML. Ideally we could add a footnote or so in the LaTeXML manual?
In any case, closing this issue.
Sure; although it's one of those obvious, once you realize it items. Where in the manual would it be noticeable/findable?
[Originally Ticket 1657]
In short, it seems like any non-ASCII characters in the returned expression are likely to cause problems. It is possible to "work around" this by using utf8_encode(...) on the returned XHTML expression before saving it to the database, but so far the work around just ends up causing more problems.
Here's a detailed example. It might be hard to discern just what the problem is from outside of PHP. For the moment, I'm pasting the full error message here in case it offers some clue: http://pastebin.com/4h1FjZeL.
One way to reliably produce a similar error message on Drupal without actually running LaTeXML is to create a "Basic Page", and paste in the "result" that you see in that pastebin. If I can help with debugging at that level just let me know!
Here's a command line version of what exactly what Drupal posts to the LaTeXML daemon (the non-URL-encoded versions of the content follow):
preamble:
document: