Open jfbu opened 6 years ago
Oh most likely mylatex.ltx should (since the 2018-04-01 latex release) restore the \everyjob
settings that latex uses to read the commandline filenames safely. I'll look in to it ....
The \everyjob
settings in LaTeX format (as recently updated by yourself) actually destroy the commandline filename safety:
$ pdflatex ééé
is ok but
$ pdflatex \\input{ééé}
fails due to \everyjob
execution. (just clarifying to myself, as of course you know this much better than me)
well without the everyjob settings the first form wouldn't work either, so it isn't those settings that break \input{ééé} so much as the implict utf8 handling the
\inputform isn't so bad as you can use
\input{\detokenize{ééé}}}` (and we may see a way to make that work automatically at some point)
well without the everyjob settings the first form wouldn't work either
Would it make sense to delay the everyjob utf8 related activation (I mean by that the activation of the non \string
-ified actions), i.e. remove it from the format and transfer it to the LaTeX document classes? I suppose this puts burden on third party classes, on the other hand users of those classes could still be with inputenc usage in preamble, and class maintainers could at some point adopt the LaTeX team provided code and announce to their users they too can drop inputenc+utf8 from their preamble.
No I think that would lead to massive fragmentation as it would be impossible to know what is happening, there are thousands of university thesis classes lying on hard disks round the globe.
I am not sure what issue you are worried about there have been almost no reported issues with the UTF-8 default handling, just the edge case of windows using legacy file system encodings not UTF-8, which was addressed in patch 5.
No I don't have much of an issue but a slight uneasiness about active characters already on command line. For example,
pdflatex \\input \\detokenize{é}#1.tex
works but is somewhat complicated, and the \detokenize
couldn't have the #
inside.
You will ask why \input
? well not me, but AUCTeX used it per default (with braces, so here it would raise other difficulties). Recent commits will as far as I followed reduce usage of \input
.
In the context of a format created via mylatex.ltx, the catcode regime and the fact that an active end of line is expected, it got a bit arduous to manage to apply the \detokenize
to a non-ascii filename for a pdflatex run using this format (using command line&format
, not first-line-parsing). (in the past AUCTeX used no \input
precisely in that case; and it is difficult to do so with the \
being active and the \everyjob
making now non-ascii unfriendly... but this "arduous" bit got solved in current commits to AUCTeX dev repo). So this new situation creates trouble in certain specialized contexts.
Also, the whole way LaTeX activates non-ascii characters (I am thinking of the LICR) appears like a legacy of the past. If we live in an UTF-8 world, why LICR? If we need \unexpanded
and \detokenize
to handle the new situation, why not use \protected
too in the definitions and get the non-ascii to expand to themselves when written to files and behave nicely in \edef
with no extra precautions? I have not thought really, I just wrote that in jest, and of course it borders on off-topic, but that was simply to try to express my feelings: I feel like the LaTeX UTF-8 is perhaps not as "modern" as it could be.
On 2 June 2018 at 17:19, Jean-François B. notifications@github.com wrote:
No I don't have much of an issue but a slight uneasiness about active characters already on command line. For example,
pdflatex \input \detokenize{é}#1.tex
works but is somewhat complicated, and the \detokenize couldn't have the # inside.
You will ask why \input? well not me, but AUCTeX used it per default (with braces, so here it would raise other difficulties). Recent commits will as far as I followed reduce usage of \input.
In the context of a format created via mylatex.ltx, the catcode regime and the fact that an active end of line is expected, it got a bit arduous to manage to apply the \detokenize to a non-ascii filename for a pdflatex run using this format (using command line&format, not first-line-parsin
I think mylatex would be affected either way: the encoding was switched, whether in everyjob or the document class it would be switched before the \dump (and probably needs switching back in mylatex.ltx so the custom format acts like a standard one)
g). (in the past AUCTeX used no \input precisely in that case; and it is difficult to do so with the \ being active and the \everyjob making now non-ascii unfriendly... but this "arduous" bit got solved in current commits to AUCTeX dev repo). So this new situation creates trouble in certain specialized contexts.
yes can't be helped, I did ping them .
Also, the whole way LaTeX activates non-ascii characters (I am thinking of the LICR) appears like a legacy of the past. If we live in an UTF-8 world, why LICR?
Because classic tex can't use Unicode fonts.
If we need \unexpanded and \detokenize to handle the new situation, why not use \protected too in the definitions and get the non-ascii to expand to themselves when written to files and behave nicely in \edef with no extra precautions?
As I say we may get that to work anyway for \input
but the main
definitions can't be like that as things wouldn't typeset then. Switching
between 7bit and 8bit encodings may seem archaic in a Unicode world but
it's the world we live in.
I have not thought really, I just wrote that in jest, and of course it borders on off-topic, but that was simply to try to express my feelings: I feel like the LaTeX UTF-8 is perhaps not as "modern" as it could be.
neither is a cmr10 font with 127 characters , but...
It is difficult to get mylatex.ltx to work with filenames containing non-ascii characters (with pdflatex) since those are active (the
everyjob
execution done when first macro ofmylatex.ltx
is encountered when generating preamble gives to non-ascii characters their real inputenc+utf8 meanings).(I have edited my initial wording which was erroneous)
Very recently commits have been pushed to AUCTeX and the development version is able to cope successfully (for pdftex engine, as xetex and luatex are not concerned by this of course) with using
mylatex.ltx
to generate cached preambles even for filenames with non-ascii characters (and spaces). Many years ago already a hack into\dump
had been put in place by AUCTeX maintainer to cope with filenames containing spaces. (not contiguous)Thus signalling this issue for it to get documented perhaps, but not pushing too much for a fix as this will probably break the AUCTeX manoeuvers
;-)