cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Locale encoding of file system #273

Closed qiancy98 closed 3 years ago

qiancy98 commented 3 years ago

This pull can specify the encoding of the file system. This is my first time writing perl and I was just copying and imitating from the same file. I will appreciate it if you can check my pull (I am hopping that it satisfies the gramma.)

At least it works well. Two logs are shown below with command perl .\latexindent.pl '.\notes of sparsity.tex' > try.tex under powershell (since indentconfig.yaml is included in the log I will not put it alone). Their only difference is the encoding switch of indentconfig.yaml.

INFO:  latexindent.pl version 3.9.3, 2021-05-07, a script to indent .tex files
       latexindent.pl lives here: C:/Users/QianCY/github/latexindent.pl/
       Fri Jun  4 00:00:34 2021
       Filename: .\notes of sparsity.tex
INFO:  Processing switches:
INFO:  Directory for backup files and indent.log: .
INFO:  Perl modules are being loaded from the following directories:
       C:/Strawberry/perl/lib/FindBin.pm
       C:/Strawberry/perl/vendor/lib/YAML/Tiny.pm
       C:/Strawberry/perl/lib/File/Copy.pm
       C:/Strawberry/perl/lib/File/Basename.pm
       C:/Strawberry/perl/lib/Getopt/Long.pm
       C:/Strawberry/perl/vendor/lib/File/HomeDir.pm
       C:/Strawberry/perl/vendor/lib/Unicode/GCString.pm
INFO:  Latex Indent perl modules are being loaded from, for example:
       C:/Users/QianCY/github/latexindent.pl/LatexIndent/Document.pm
INFO:  YAML settings read: defaultSettings.yaml
        Reading defaultSettings.yaml from C:/Users/QianCY/github/latexindent.pl/defaultSettings.yaml
INFO:  YAML settings read: indentconfig.yaml or .indentconfig.yaml
INFO:  Encoding of the paths is GB2312
       Reading path information from C:\Users\QianCY/indentconfig.yaml
       (Alternatively C:\Users\QianCY/.indentconfig.yaml can be used)
       ---
       encoding: GB2312
       paths:
         - C:\Users\QianCY\latexindent.yaml
         - D:\其他网盘\latexindent.yaml

       Transform file encoding: C:\Users\QianCY\latexindent.yaml -> C:\Users\QianCY\latexindent.yaml
       Transform file encoding: D:\其他网盘\latexindent.yaml -> D:\ÆäËûÍøÅÌ\latexindent.yaml
INFO:  YAML settings, reading from the following files:
       Reading USER settings from C:\Users\QianCY\latexindent.yaml
       ---
       indentRules:
         item: "\t"
       lookForAlignDelims:
         align:
           alignDoubleBackSlash: '0'
           delims: '1'
         align*:
           alignDoubleBackSlash: '0'
           delims: '1'
       verbatimEnvironments:
         tikzpicture: '1'

       Reading USER settings from D:\ÆäËûÍøÅÌ\latexindent.yaml
       ---
       indentRules:
         item: "\t"
       lookForAlignDelims:
         align:
           alignDoubleBackSlash: '0'
           delims: '1'
         align*:
           alignDoubleBackSlash: '0'
           delims: '1'
       verbatimEnvironments:
         tikzpicture: '1'

INFO:  Phase 1: searching for objects
INFO:  Phase 2: finding surrounding indentation
INFO:  Phase 3: indenting objects
INFO:  Phase 4: final indentation check
INFO:  Output routine:
       Not outputting to file; see -w and -o switches for more options.
       --------------
INFO:  Please direct all communication/issues to:
        https://github.com/cmhughes/latexindent.pl
INFO:  latexindent.pl version 3.9.3, 2021-05-07, a script to indent .tex files
       latexindent.pl lives here: C:/Users/QianCY/github/latexindent.pl/
       Fri Jun  4 00:03:11 2021
       Filename: .\notes of sparsity.tex
INFO:  Processing switches:
INFO:  Directory for backup files and indent.log: .
INFO:  Perl modules are being loaded from the following directories:
       C:/Strawberry/perl/lib/FindBin.pm
       C:/Strawberry/perl/vendor/lib/YAML/Tiny.pm
       C:/Strawberry/perl/lib/File/Copy.pm
       C:/Strawberry/perl/lib/File/Basename.pm
       C:/Strawberry/perl/lib/Getopt/Long.pm
       C:/Strawberry/perl/vendor/lib/File/HomeDir.pm
       C:/Strawberry/perl/vendor/lib/Unicode/GCString.pm
INFO:  Latex Indent perl modules are being loaded from, for example:
       C:/Users/QianCY/github/latexindent.pl/LatexIndent/Document.pm
INFO:  YAML settings read: defaultSettings.yaml
        Reading defaultSettings.yaml from C:/Users/QianCY/github/latexindent.pl/defaultSettings.yaml
INFO:  YAML settings read: indentconfig.yaml or .indentconfig.yaml
INFO:  Encoding of the paths takes the default.
       Reading path information from C:\Users\QianCY/indentconfig.yaml
       (Alternatively C:\Users\QianCY/.indentconfig.yaml can be used)
       ---
       paths:
         - C:\Users\QianCY\latexindent.yaml
         - D:\其他网盘\latexindent.yaml

INFO:  YAML settings, reading from the following files:
       Reading USER settings from C:\Users\QianCY\latexindent.yaml
       ---
       indentRules:
         item: "\t"
       lookForAlignDelims:
         align:
           alignDoubleBackSlash: '0'
           delims: '1'
         align*:
           alignDoubleBackSlash: '0'
           delims: '1'
       verbatimEnvironments:
         tikzpicture: '1'

WARN:  C:\Users\QianCY/indentconfig.yaml
       specifies D:\其他网盘\latexindent.yaml but this file does not exist - unable to read settings from this file
INFO:  Phase 1: searching for objects
INFO:  Phase 2: finding surrounding indentation
INFO:  Phase 3: indenting objects
INFO:  Phase 4: final indentation check
INFO:  Output routine:
       Not outputting to file; see -w and -o switches for more options.
       --------------
INFO:  Please direct all communication/issues to:
        https://github.com/cmhughes/latexindent.pl
qiancy98 commented 3 years ago

TODO: if the encode function throws a warning, write it into log.

qiancy98 commented 3 years ago

Finish the TODO now.

cmhughes commented 3 years ago

Thanks very much for this, I've reviewed it and it looks good!

  1. A minor point: I think that this line can be deleted, as I'd say the default would be not to mention encoding:

    $logger->info("*Encoding of the paths takes the default.");
  2. A bigger question: does this feature need to be applied for other user settings? For example, if I call

    latexindent.pl -l=其他网盘.yaml myfile.tex

    does the local file 其他网盘.yaml need encoding?

  3. Final question: am I correct to reference https://metacpan.org/pod/distribution/Encode/lib/Encode/Supported.pod for the supported values for the encoding field?

qiancy98 commented 3 years ago
  1. It seems to be YES. I was adding this line because I am newer to perl and I have to guarantee that the program actually runs into this branch. I think that you is expert enough to read the code without this infomation.
  2. I am also newer to both cmd and powershell. However, as far as I have tested, in both chcp 936 (the default code page for Chinese users, encoded in GB2312) and chcp 65001 (encoded in UTF-8), the command that perl receives is encoded in GB2312, which is the desired one. Hence I believe that we do not need to transform the encoding of commands received from either cmd or powershell.
  3. It seems to be right. The web page I refered to is https://perldoc.perl.org/Encode::Supported. I do not know the correspordence between versions to Encode and versions to perl (sorry I am newer...) but I hope that this will not change so much during the days... (I guess that you want your program work from perl-5.10 to perl-5.34? Sayly I do not know how to guarantee that...)
  4. Thanks for your review!!!
cmhughes commented 3 years ago

Thanks so much for your time on this, I'm really grateful! :)

This looks great to me. I've tested it on my Linux machine, and it doesn't change anything, which is as I would expect. I don't have a direct way of replicating your results, but as this is an optional feature, I'm happy to accept it as is.

I'll get this documented, and credit you in the documentation. It'll be part of the next release.

Thanks so much again!

For my reference, this is about https://github.com/cmhughes/latexindent.pl/issues/263

qiancy98 commented 3 years ago

It seems to be OK. I struggled but failed to compile it, and thus the following codes are not checked by latexmk... Only one question:

    If you find that \announce{new}{encoding option for indentconfig.yaml} \texttt{latexindent.pl}
    does not read your YAML file, then you might like to explore the \texttt{encoding} option
    for \texttt{indentconfig.yaml} as demonstrated in \cref{lst:indentconfig-encoding}.

is it a need for us to mension the reason and a simple solution for Windows users? for example: (Of course you have to change it to meet the rest format of the document...)

    If you find that \announce{new}{encoding option for indentconfig.yaml} \texttt{latexindent.pl}
    does not read your YAML file, then it might be as a result of the default commandline encoding not being UTF-8
    \footnote{Normally this will only occcur for Windows users}.
    In this case, you might like to explore the \texttt{encoding} option
    for \texttt{indentconfig.yaml} as demonstrated in \cref{lst:indentconfig-encoding}.

    For Windows users that meet this question, there is an easy way:
    you can run the following command in either 
    \texttt{cmd.exe} or \texttt{powershell.exe}:
    \begin{lstlisting}
chcp
    \end{lstlisting}
    Assume that you receive the following result
    \begin{verbatim}
Active code page: 936
    \end{verbatim}
    Then you can use
    \begin{lstlisting}
encoding: cp936
    \end{lstlisting}
    in \texttt{indentconfig.yaml}, where 936 is the result of the \texttt{chcp} command.
    This should work in most of cases.
cmhughes commented 3 years ago

That's wonderful, thanks so much! I've implemented your suggestions as of https://github.com/cmhughes/latexindent.pl/commit/4b38910de547a6790baf5b4f0ccbf493ffc7c33f

Thanks so much!