CTeX-org / ctex-kit

Macro Packages and Scripts for Chinese TeX users
965 stars 124 forks source link

Handling automatic spaces between CJK and Latin characters in LuaLaTeX #711

Open BenjaminGalliot opened 2 months ago

BenjaminGalliot commented 2 months ago

(Sorry for using English!)

Hello,

I'm currently working on an automatically generated document using LuaLaTeX. I've encountered an issue where there isn't an automatic space between CJK and Latin script characters, which affects the readability of mixed-language texts. For example:

中文
…français

The ellipsis (…) directly follows the Chinese characters without any space. I would like to have an automatic space inserted between CJK and Latin characters whenever a line break occurs between them.

Is there a known method or a recommended practice within LuaLaTeX environment to handle this spacing automatically? Any guidance or workaround to ensure proper spacing between these character sets would be greatly appreciated.

Thank you in advance for your help!

MWE:

\documentclass{article}
\RequirePackage[french]{babel}
\RequirePackage{ctex}
\babelprovide[import=zh-Hans]{cmn}
\setCJKfamilyfont{cmn}{AR PL UKai CN}
\setmainfont{EB Garamond}
\RenewDocumentCommand \CJKrmdefault {} {cmn}
\babelfont[french]{rm}{EB Garamond}
\NewDocumentCommand \scriptcjk {} {\ltjsetparameter{jacharrange={-1, +2, +3, -4, -5, +6, +7, -8, +9}}}
\NewDocumentCommand \scriptlatin {} {\ltjsetparameter{jacharrange={-1, -2, -3, -4, -5, +6, +7, -8, -9}}}
\NewDocumentCommand \tfra { m } {\foreignlanguage{french}{\scriptlatin#1}}
\NewDocumentCommand \tcmn { m } {\foreignlanguage{cmn}{\scriptcjk#1}}
\frenchsetup{og=«, fg=», AutoSpacePunctuation=true} % Can be turned off if necessary.

\begin{document}
\scriptlatin
\selectlanguage{french}

中文 …français  % Reference.

中文…français  % Expected.

中文
…français  % Not wanted.

中文\ 
…français  % Workaround, but I try to find better.

---------
% It should also work with commands around.

\tcmn{中文} \tfra{…français}

\tcmn{中文}\tfra{…français}

\tcmn{中文}
\tfra{…français}

\tcmn{中文}\
\tfra{…français}

---------
% Without punctuation.

\tcmn{中文} \tfra{français}

\tcmn{中文}\tfra{français}

\tcmn{中文}
\tfra{français}

\tcmn{中文}\
\tfra{français}

---------
% Various behaviours depending on punctuation?

中文 
\tfra{«français}

中文\ 
\tfra{«français}

中文 
\tfra{"français}

中文\ 
\tfra{"français}

中文 
\tfra{(français}

中文\ 
\tfra{(français}

中文 
\tfra{[français}

中文\ 
\tfra{[français}

中文 
\tfra{-français}

中文\ 
\tfra{-français}

中文 
\tfra{–français}

中文\ 
\tfra{–français}

中文 
\tfra{—français}

中文\ 
\tfra{—français}

\end{document}

Screenshot: Screenshot_20240423_202707

My workaround of manually inserting spaces (\) is not ideal. I am looking for a more elegant solution that would automatically handle these spaces, particularly after a line break.

In addition to the main issue of spacing after line breaks, I've also noticed that the behavior changes depending on the punctuation used. Is it the intended behaviour? Is it possible to customize it?

Thank you very much.

muzimuzhi commented 2 months ago

The reported behavior may be inherited from luatexja. I haven't checked.

wangweixuan commented 2 days ago

The behavior of line breaks is explained in §15.2 in LuaTeX-ja manual:

Considering these situations, handling of an end-of-line in LuaTeX-ja are as follows:

A character whose character code is \ltjlineendcomment is appended to an input line, before LuaTeX actually process it, if and only if the following three conditions are satisfied:

  1. The category code of \endlinechar is 5 (end-of-line).
  2. The category code of \ltjlineendcomment itself is 14 (comment).
  3. The input line matches the following “regular expression”: [...]

To avoid line breaks being treated as comments and ignored, you can do

\catcode\ltjlineendcomment=0

The different handling of punctuations is explained in §4.3:

It is not desirable that ‌xkanjiskip‌ is inserted into every boundary between JAchars and ALchars. For example, xkanjiskip‌ should not be inserted after opening parenthesis (e.g., compare “(あ” and “( あ”). LuaTeX-ja can control whether xkanjiskip‌ can be inserted before/after a character, by changing ‌jaxspmode‌ for JAchars and ‌alxspmode‌ parameters ALchars respectively.

[...]

The second argument preonly means that the insertion of ‌xkanjiskip‌ is allowed before this character, but not after. the other possible values are postonly, allow, and inhibit.

The default settings include

\ltjsetparameter{jaxspmode={`“,preonly}}
\ltjsetparameter{jaxspmode={`”,postonly}}
\ltjsetparameter{jaxspmode={`—,inhibit}}% U+2014 EM DASH