Closed lemzwerg closed 8 years ago
While the XeTeX lists mulls over the request to up the class numeral from unint8 to uint16, not loading some of the more exotic classes seems sensible.
Just thinking out loud, we could also only load plane 0 by default, with a \plane1, \plane2, \plane14, \plane15, and \plane16 command to load in the additional planes as necessary?
I rather suggest that some blocks get merged. For example, there is zero advantage of having three blocks for 'Myanmar', 'MyanmarExtendedA', and 'MyanmarExtendedB' – a single block would be both better and saving \XeTeXcharclass
registers.
However, this needs some recoding, and I don't have time for that, unfortunately.
Hm, that feels like a thing I can get done with a quick script or even a bit of Sublime Text'ing, although figuring out which blocks can be collapsed that way will indeed take a little bit of time.
Well, here are groups of the simpler cases, which should be sufficient for the time being...
\do{Arabic}{"0600}{"06FF}
\do{ArabicExtendedA}{"08A0}{"08FF}
\do{ArabicPresentationFormsA}{"0FB50}{"0FDFF}
\do{ArabicPresentationFormsB}{"0FE70}{"0FEFF}
\do{ArabicSupplement}{"0750}{"077F}
\do{Bamum}{"0A6A0}{"0A6FF}
\do{BamumSupplement}{"016800}{"016A3F}
\do{BasicLatin}{"0020}{"007F} % 0000..007F in Unicode standard
\do{LatinExtendedA}{"0100}{"017F}
\do{LatinExtendedAdditional}{"01E00}{"01EFF}
\do{LatinExtendedB}{"0180}{"024F}
\do{LatinExtendedC}{"02C60}{"02C7F}
\do{LatinExtendedD}{"0A720}{"0A7FF}
\do{LatinExtendedE}{"0AB30}{"0AB6F}
\do{LatinSupplement}{"0080}{"00FF}
\do{Bopomofo}{"03100}{"0312F}
\do{BopomofoExtended}{"031A0}{"031BF}
\do{Cherokee}{"013A0}{"013FF}
\do{CherokeeSupplement}{"0AB70}{"0ABBF}
\do{Coptic}{"02C80}{"02CFF}
\do{CopticEpactNumbers}{"0102E0}{"0102FF}
\do{Cyrillic}{"0400}{"04FF}
\do{CyrillicExtendedA}{"02DE0}{"02DFF}
\do{CyrillicExtendedB}{"0A640}{"0A69F}
\do{CyrillicSupplement}{"0500}{"052F}
\do{Devanagari}{"0900}{"097F}
\do{DevanagariExtended}{"0A8E0}{"0A8FF}
\do{Ethiopic}{"01200}{"0137F}
\do{EthiopicExtended}{"02D80}{"02DDF}
\do{EthiopicExtendedA}{"0AB00}{"0AB2F}
\do{EthiopicSupplement}{"01380}{"0139F}
\do{Georgian}{"010A0}{"010FF}
\do{GeorgianSupplement}{"02D00}{"02D2F}
\do{GreekAndCoptic}{"0370}{"03FF}
\do{GreekExtended}{"01F00}{"01FFF}
\do{HangulCompatibilityJamo}{"03130}{"0318F}
\do{HangulJamo}{"01100}{"011FF}
\do{HangulJamoExtendedA}{"0A960}{"0A97F}
\do{HangulJamoExtendedB}{"0D7B0}{"0D7FF}
\do{HangulSyllables}{"0AC00}{"0D7AF}
\do{Khmer}{"01780}{"017FF}
\do{KhmerSymbols}{"019E0}{"019FF}
\do{MeeteiMayek}{"0ABC0}{"0ABFF}
\do{MeeteiMayekExtensions}{"0AAE0}{"0AAFF}
\do{Myanmar}{"01000}{"0109F}
\do{MyanmarExtendedA}{"0AA60}{"0AA7F}
\do{MyanmarExtendedB}{"0A9E0}{"0A9FF}
\do{Sinhala}{"0D80}{"0DFF}
\do{SinhalaArchaicNumbers}{"0111E0}{"0111FF}
\do{Sundanese}{"01B80}{"01BBF}
\do{SundaneseSupplement}{"01CC0}{"01CCF}
\do{UnifiedCanadianAboriginalSyllabics}{"01400}{"0167F}
\do{UnifiedCanadianAboriginalSyllabicsExtended}{"018B0}{"018FF}
thanks!
Given the v2.1 release for ucharclasses so it works with XeTeX 0.99996, would you be willing to rebase this PR?
Rebased. There's now also code to cater for the LaTeX override, using code suggested by David Carlisle.
It's not clear to me what you mean with 'duplication'. Please elaborate.
On line 44 we have:
\ifdefined\XeTeXinterwordspaceshaping
\def\newXeTeXintercharclass{%
\e@alloc\XeTeXcharclass\chardef\xe@alloc@intercharclass\m@ne{4095 }}
\fi
but the 0.99994 fix also introduced this on line 806:
\ifdefined\XeTeXinterwordspaceshaping
\chardef\@ucharclass@boundary=4095 %
\else
\chardef\@ucharclass@boundary=\@cclv
\fi
looking at it closer that's not duplication, but should those two things be grouped into a single \ifdefined...\fi
block?
Probably yes; it would be a minor follow-up patch.
wfm, I've filed https://github.com/Pomax/ucharclasses/issues/17 for that purposes and will merge this in. Would you like to be credited in the .sty file and README for the v2.2 update this will lead to?
Please revise; some decisions might be questionable.