cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
864 stars 84 forks source link

[Feature] More friendly support for CJK words (characters) #529

Closed Mikachu2333 closed 4 months ago

Mikachu2333 commented 5 months ago

Here is a tabular which has been formatted by latexindent, but as you can see, due to the symbol of "~", the tabular could not be sorted as an all-English tabular.

As we all know, one Chinese character occupying space between two English characters. And all qusetions break out for that reason especially when I wrote the sentences with mixing Chinese and English, as shown in the following figure. Therefore, I hope you can make improvements to this issue...

\begin{table}[H]
  \centering
  \begin{tblr}{
      hlines,
      vlines,
      cells = {c,m}}
    5:30        & 起床 & 12:30~14:00 & 午休  \\
    6:00~7:00   & 早操 & 14:00~17:30 & 集训  \\
    7:00~7:50   & 早餐 & 17:30~18:30 & 晚餐  \\
    7:50~11:00  & 集训 & 19:00~21:00 & 晚自习 \\
    11:00~12:30 & 午餐 & 23:00       & 熄灯  \\
  \end{tblr}
\end{table}

图片

Catastrophic

trarer commented 5 months ago

采用半宽英文字体,开启 unicode-string

Mikachu2333 commented 5 months ago

开启 unicode-string

终究只是权宜之计,真正根除还得作者改进统计方式啊

采用半宽英文字体

我现在就是用的思源黑HW,中英文标准1:2

trarer commented 5 months ago

搞不懂你“统计方式”意思,采用 unicode-string 功能可将 unicode 当作两字符宽来处理,带有汉字的表格可以对齐。不开启这个功能,默认是将所有字符当作等宽的。或者是说,你觉得程序应该根据实际字体的大小来统计,这是排版程序应该干的事情。问题的本质就是汉字比拉丁字母复杂太多,混排时做到中西文等宽会导致拉丁字母太大或汉字太小。若是采用半汉字宽英文,英文又会相对偏小或汉字偏大。我感觉应该给出一个用户设定中西文字体倍率的功能,比如一个汉字等于 1.5 个拉丁字母,而不是只能固定 2。

Mikachu2333 commented 5 months ago

采用 unicode-string 功能可将 unicode 当作两字符宽来处理,带有汉字的表格可以对齐

抱歉,这里没懂。按照您的说法,我该如何调整我的VSCode设置才能让表格看起来排版整齐呢?

你觉得程序应该根据实际字体的大小来统计

是的,我的想法就是按照utf编码排序,如果是某几区的字符就把一个字符统计为两个再format,不过看起来有点天方夜谭了……

图片

trarer commented 5 months ago

你想的没错,并不是天方夜谭。latexindent 的 unicode::string 功能就是 unicode 的 CJK 分区的字符当作 2 个字符来计算的,但不是实际字体的大小,所以你必须找一个 0.5 倍汉字宽的西文等宽字体配合使用。你需要在 latexindent 命令的参数中加入 -GCString,可以打开 unicode::string 功能。

Mikachu2333 commented 5 months ago

你需要在 latexindent 命令的参数中加入 -GCString,可以打开 unicode::string 功能。

还有这个解决方案?!牛逼!

Mikachu2333 commented 5 months ago

可以打开 unicode::string 功能

不大对劲,我是用的windows以及latexindent独立程序,按照doc中所言,windows上的独立程序已经默认启用,所以无论是否使用都不应当看到任何有差异化的输出……

所以说还是不太行啊 (笑哭)

图片

trarer commented 5 months ago

不知道了,肯定是可以的。如果你用命令行肯定是可以的,如果你用 vscode 你需要在设置中加入参数。

cmhughes commented 5 months ago

I'm sorry, I have no idea what this means

On Fri, 22 Mar 2024, 00:55 trarer, @.***> wrote:

不知道了,肯定是可以的。如果你用命令行肯定是可以的,如果你用 vscode 你需要在设置中加入参数。

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/529#issuecomment-2014124365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYADCMAFGF6FMMVC6G3YZN6OLAVCNFSM6AAAAABFBRWTS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGEZDIMZWGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

trarer commented 5 months ago

I'm sorry, I have no idea what this means On Fri, 22 Mar 2024, 00:55 trarer, @.> wrote: 不知道了,肯定是可以的。如果你用命令行肯定是可以的,如果你用 vscode 你需要在设置中加入参数。 — Reply to this email directly, view it on GitHub <#529 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYADCMAFGF6FMMVC6G3YZN6OLAVCNFSM6AAAAABFBRWTS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGEZDIMZWGU . You are receiving this because you are subscribed to this thread.Message ID: @.>

@Mikachu2333 has trouble using Unicode::GCString on vscode.

cmhughes commented 5 months ago

This repository has nothing to do with vscode, I recommend posting your issue on the vscode repository.

saxyx commented 5 months ago

可以打开 unicode::string 功能

不大对劲,我是用的windows以及latexindent独立程序,按照doc中所言,windows上的独立程序已经默认启用,所以无论是否使用都不应当看到任何有差异化的输出……

所以说还是不太行啊 (笑哭)

图片

In the latest version 3.23.7, from the code in C:\Users\AAA\AppData\Local\Temp\par-5875\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\inc\lib/LatexIndent/AlignmentAtAmpersand.pm extracted by latexindent.exe,

sub get_column_width {

    my $stringToBeMeasured = $_[0];

    # default length measurement
    # credit/reference: https://perldoc.perl.org/perlunicook#%E2%84%9E-33:-String-length-in-graphemes
    unless ( $switches{GCString} ) {
        my $count = 0;
        while ( $stringToBeMeasured =~ /\X/g ) { $count++ }
        return $count;
    }

    # if GCString active, then use Unicode::GCString
    return Unicode::GCString->new($stringToBeMeasured)->columns();
}

it can be seen that latexindent.exe only automatically loads Unicode::GCString and does not default to using Unicode::GCString to treat cjk characters as 2 characters for length calculation. You still need to use –GCString to enable this feature.

Mikachu2333 commented 5 months ago

You still need to use –GCString to enable this feature.

Thanks for answering!

And, I use the powershell to run the command D:\texlive\2024\bin\windows\latexindent.exe -c d:/LanguageLearning/Latex/test/ d:\\LanguageLearning\\Latex\\test\\main -y=defaultIndent: ' ' -GCString, and, everything is pretty good without any error.


So, I change the args and use --screenlog and Finally got the answer of both #528 and this issue.

As the following output and pic show, the vscode use the latexindent in the temp floder instead of the exe file I worote in settings, and the two have different SHA-1 value...

#sha1#
C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe
D252625B4EA7FDE482F1F4EC291625D2F9F1C7CB

#sha1#
D:\texlive\2024\bin\windows\latexindent.exe
2EA50CF2185A183F4982E3391A7A54367A1666E7
INFO:  ANSI Code Page:  936
INFO:  Current console output code page: 936 
INFO:  Change the current console output code page to 65001
INFO:  Command line:
       C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7/latexindent.exe --screenlog --overwriteIfDifferent --cruft=d:/LanguageLearning/Latex/test/ --modifylinebreaks "--yaml=defaultIndent: '    '" --GCString d:\LanguageLearning\Latex\test\main
       Command line arguments:
       --screenlog, --overwriteIfDifferent, --cruft=d:/LanguageLearning/Latex/test/, --modifylinebreaks, --yaml=defaultIndent: '    ', --GCString, d:\LanguageLearning\Latex\test\main

INFO:  latexindent.exe version 3.23.7, 2024-03-16, a script to indent .tex files
       latexindent.exe lives here: D:/texlive/2024/bin/windows/
       Sat Mar 23 12:22:02 2024
       Filename: d:\LanguageLearning\Latex\test\main
INFO:  Processing switches:
       -sl|--screenlog: log file will also be output to the screen
       -wd|--overwriteIfDifferent: will overwrite ONLY if indented text is different
       -y|--yaml: YAML settings specified via command line
       -m|--modifylinebreaks: modify line breaks
       -c|--cruft: cruft directory
       --GCString switch active, loading Unicode::GCString module
INFO:  Directory for backup files and d:/LanguageLearning/Latex/test//indent.log:
       d:/LanguageLearning/Latex/test/
INFO:  YAML settings read: defaultSettings.yaml
       Reading defaultSettings.yaml from D:/texlive/2024/bin/windows/defaultSettings.yaml
       Reading defaultSettings.yaml (2nd attempt) from D:/texlive/2024/bin/windows/../../texmf-dist/scripts/latexindent/defaultSettings.yaml
       and then, if necessary, D:/texlive/2024/bin/windows/LatexIndent/defaultSettings.yaml
INFO:  YAML reading settings
       Home directory is C:\Users\mikac
       latexindent.pl didn't find indentconfig.yaml or .indentconfig.yaml
       see all possible locations: https://latexindentpl.readthedocs.io/en/latest/sec-appendices.html#indentconfig-options)
INFO:  YAML settings read: -y switch
       YAML setting: defaultIndent:'    '
       single-quoted string found in -y switch: '    ', substitute to     
       Updating mainSettings with defaultIndent:     
INFO:  File extension work:
       latexindent called to act upon d:\LanguageLearning\Latex\test\main without a file extension;
       searching for files in the following order (see fileExtensionPreference):
       d:\LanguageLearning\Latex\test\main.tex
       d:\LanguageLearning\Latex\test\main.sty
       d:\LanguageLearning\Latex\test\main.cls
       d:\LanguageLearning\Latex\test\main.bib
       d:\LanguageLearning\Latex\test\main.tex found!
       Updated fileName to d:\LanguageLearning\Latex\test\main.tex
INFO:  Phase 1: searching for objects
INFO:  Phase 2: finding surrounding indentation
INFO:  Phase 3: indenting objects
INFO:  Phase 4: final indentation check
INFO:  -wd switch active
       Original body matches indented body, NOT overwriting, no backup files created
INFO:  Output routine:
       Not outputting to file; see -w and -o switches for more options.

Full vscode settings.

"latex-workshop.bibtex-fields.sort.enabled": true,
    "latex-workshop.bibtex-format.sort.enabled": true,
    "latex-workshop.bibtex-format.tab": "4 spaces",
    "latex-workshop.intellisense.file.base": "both",
    "latex-workshop.intellisense.package.enabled": true,
    "latex-workshop.intellisense.triggers.latex": [],
    "latex-workshop.latex.autoClean.run": "onBuilt",
    "latex-workshop.latex.build.clearLog.everyRecipeStep.enabled": false,
    "latex-workshop.latex.clean.fileTypes": [
        "*.aux",
        "*.bbl",
        "*.blg",
        "*.idx",
        "*.ind",
        "*.lof",
        "*.lot",
        "*.out",
        "*.toc",
        "*.acn",
        "*.acr",
        "*.alg",
        "*.glg",
        "*.glo",
        "*.gls",
        "*.ist",
        "*.fls",
        "*.log",
        "*.fdb_latexmk",
        "*.synctex.gz"
    ],
    "latex-workshop.latex.recipe.default": "lastUsed",
    "latex-workshop.latex.recipes": [
        {
            "name": "XeLaTeX *2",
            "tools": [
                "xelatex",
                "xelatex"
            ]
        },
        {
            "name": "XeLaTeX *3",
            "tools": [
                "xelatex",
                "xelatex",
                "xelatex"
            ]
        },
        {
            "name": "XeLaTeX -> BibTeX",
            "tools": [
                "xelatex",
                "bibtex",
                "xelatex",
                "xelatex"
            ]
        }
    ],
    "latex-workshop.latex.tools": [
        {
            "args": [
                "-synctex=1",
                "-interaction=nonstopmode",
                "-file-line-error",
                "%DOCFILE%"
            ],
            "command": "xelatex",
            "name": "xelatex"
        },
        {
            "args": [
                "%DOCFILE%"
            ],
            "command": "bibtex",
            "name": "bibtex"
        }
    ],
    "latex-workshop.latexindent.args": [
        "--screenlog",
        "--overwriteIfDifferent",
        "--cruft=%DIR%/",
        "--modifylinebreaks",
        "--yaml=defaultIndent: '    '",
        "--GCString",
        "%DOC_W32%"
    ],
    "latex-workshop.latexindent.path": "D:\\texlive\\2024\\bin\\windows\\latexindent.exe",
    "latex-workshop.message.error.show": false,
    "latex-workshop.message.information.show": true,
    "latex-workshop.message.warning.show": false,
    "latex-workshop.showContextMenu": true,
    "latex-workshop.synctex.afterBuild.enabled": true,
    "latex-workshop.texcount.autorun": "onSave",
    "latex-workshop.view.autoFocus.enabled": true,
    "latex-workshop.view.pdf.internal.synctex.keybinding": "double-click",
    "latex-workshop.view.pdf.invertMode.enabled": "auto",
    "latex-workshop.view.pdf.viewer": "browser",
saxyx commented 5 months ago

As the following output and pic show, the vscode use the latexindent in the temp floder instead of the exe file I worote in settings, and the two have different SHA-1 value..

When using D:\texlive\2024\bin\windows\latexindent.exe, it will automatically unzip under the Temp path and re-run a new command in the background. The --screenlog option merely displays the final command run in the background. Of course, you cannot directly use C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe to format your file.

Mikachu2333 commented 5 months ago

When using D:\texlive\2024\bin\windows\latexindent.exe, it will automatically unzip under the Temp path and re-run a new command in the background. The --screenlog option merely displays the final command run in the background. Of course, you cannot directly use C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe to format your file.

But there is no other possible reason for this issue to occur,furthermore, I am not sure if this is related to paths that were not wrapped in double quotation marks.

Besides, when running from the command line, everything is normal and correct, so I believe this is due to VSCode

Mikachu2333 commented 5 months ago

Here is the exe file I repackaged after modifying Document.pm.

After I use the exe file repacked by @fengzyf , when I was about to use --GCString arg, latexindent noted me that Locale 'Chinese (Simplified)_China.936' is unsupported... And, after remove the arg, latexindent format the file successfully but still unable to align the contents of cells.

图片

图片

cmhughes commented 4 months ago

I believe this is fixed as of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.23.8 let me know if not

Mikachu2333 commented 4 months ago

Wonderful! Tremendous! The problem that has been bothering me for a long time has been perfectly fixed!