Pomax / ucharclasses

A XeLaTeX package that lets you insert arbitrary code between characters from different unicode blocks
14 stars 4 forks source link

Conflicts between To's and From's #32

Open cykerway opened 4 years ago

cykerway commented 4 years ago

I think there are potential conflicts if one uses both \setTransitionTo and \setTransitionFrom, or if one uses \setTransitionsFor<Class> for multiple classes. I haven't sorted out a buggy example but I believe there it is if you reorder several \setTransitionsFor<Class> commands.

This is easy to understand: If one defines transition from A to B using B's \setTransitionTo, then defines transition from A to B using A's \setTransitionFrom, then the latter overwrites the former because they both call \XeTeXinterchartoks.

This means one should stick with either To or From, but not both. And the For's are pretty much useless. And the example code in the doc is misleading because it uses multiple For.

Pomax commented 4 years ago

There is nothing "useless" about the For instructions, which set To and From rules for dozens of classes at the same time. If you want to write those out explicit, by all means do, but most people want to write documents, not code, and for them the For commands are most certainly useful (myself included).

In fact, the code in the documentation relies on clobbering, which is why you see a setTransitionsForCJK first, followed by setTransitionsForJapanese. The Japanese rules clobber the CJK rules for all blocks that are relevant to Japanese typesetting. The docs can certainly be improved by adding text that points out that this will happen, but it's not a bug, It's very much intentional behaviour.

The fact that this also happens for plain To and From is inherent to the mechanism, and while currently entirely intentional, may be problematic if you mix To and From. As example:

...
\setTransitionFrom(Thai){...}
\setTransitionTo(CJKUnifiedIdeographs){...}
...
\begin{document}
นี่คือคำสำหรับไก่ในญี่ปุ่น:鶏
\end{document}

Here I would expect the From rule, set for Thai, to not kick at the thai-cjk boundary in for the given text, because the CJK rule overwrites what should happen. There might be cases where that's unwanted, but before anyone sits down to radically rewrite ucharclasses, there'd need to be a few real-world use-cases (that can't be solved by reordering or sticking with For and To rules only) to justify that work,

(and honestly, of the three rules, the From one is the odd one out as far as I'm concerned: it got added for completeness, but I can't think of a single use-case in which it's actually necessary)

cykerway commented 4 years ago

Right, the from is currently the odd one. Most of time what I need is to. But from is necessary in some use cases so it's not entirely useless. I recommend keeping it in the code.

And to solve the from vs to problem, I think we need:

Then from and to will work nicely with each other.

By the way, I think this package is very useful. Do you have any plan upgrading this to expl3?

Pomax commented 4 years ago

XeLaTeX has always been native unicode, so the only time this should need upgrading is when XeLaTeX updates its interchartok behaviour (it already did this once before, when it moved from having only 252 user-definable classes to 4092 user-definable classes, so it might at some point switch to full UCHAR, but I doubt any time soon), so developments in the LaTeX2e/LaTeX3 community don't seem very related?

(not to mention that LaTeX3 has been been the works for over 25 years now, I have no reason to believe it will ever actually get out of the "being worked on" stage)

More variables is what I was thinking too, though, but in order to preserve the current (highly desirable) clobbering behaviour, some more work would be necessary to ensure there are macros that allow changing the clobbering behaviour on the fly, with clobbering turned on by default, as well as a [noclobber] option for people who don't want to have to always be adding a \ucharnoclobber or something in their preamble.

cykerway commented 4 years ago

Alright, thank you for the feedback. I mentioned expl3 because I think it provides more useful macros than LaTeX2e and may make the "variable work" easier. I feel TeX variables are much more tricky than in a general language such as Java. I haven't made it clear how their expansion works exactly. Things seem to be better defined in LaTeX3. Otherwise, LaTeX2e or LaTeX3 is unrelated here. By the way, there are 308 Unicode blocks in total. So I guess less than 256 classes don't make much sense. Well, you could say users don't always need that many.