brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
957 stars 101 forks source link

Win32 unicode paths #2336

Open xworld21 opened 8 months ago

xworld21 commented 8 months ago

Things like latexmlc --split --splitnaming=label will produce garbled file names on Windows in presence of Unicode characters:

\documentclass{article}
\begin{document}
\section{Unicode name}\label{unicöde}
Content.
\end{document}

This is a well known perl headache, unfortunately. Possible solutions are encoding the filename to the current code page (limited to whichever character set is active), or use Win32::LongPath->openL (full Unicode support, even very long names, but new dependency). In any case, it requires modifying all the relevant open calls.

(This applies to command line parsing too, by the way, but that will be a different issue.)

dginev commented 8 months ago

Sorry for a minor aside, but since I had the thought, maybe this is just as good a place to ask as any:

Does it at some point become easier to try and use latexml via the new Windows Subsystem for Linux instead of fighting an uphill battle for portability? In my experience the Windows options for Perl are slower than the perl packaged for Ubuntu by some noticeable degree, so I've been wondering if WSL couldn't be a good alternative.

Not to dispute the issue description which is perfectly valid of course - thank you for yet another useful report!