cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Text wrapping introduces spaces when trailing comments are present #367

Closed marcin-serwin closed 2 years ago

marcin-serwin commented 2 years ago

original .tex code

\documentclass{article}

\begin{document}

Something%
\index{something}.

\end{document}

yaml settings

modifyLineBreaks:
  textWrapOptions:
    columns: 70

actual/given output

\documentclass{article}

\begin{document}

Something \index{something}.%

\end{document}

desired or expected output

\documentclass{article}

\begin{document}

Something\index{something}.%

\end{document}

or

\documentclass{article}

\begin{document}

Something%
\index{something}.

\end{document}

anything else

The -m switch with text wrapping introduces additional spaces when trailing comments are present. This may change the document in some cases. For example if pagebreak occurs between the Something and \index{something} then the index entry will point to the wrong page.

cmhughes commented 2 years ago

Apologies for the delay, I'll get to this soon hopefully.

cmhughes commented 2 years ago

The default behaviour is for the text wrapping routine of latexindent.pl to join lines using a space; this seems like the most sensible default.

For your desired output, there are at least two options.

option 1 (using blocksEndBefore)

modifyLineBreaks:
  textWrapOptions:
    columns: 70
    blocksEndBefore:
       other: |-
         (?x)
           \\begin\{
             |
           \\\[
             |
           \\end\{
             |
           \\index

and then calling

latexindent.pl -l -m myfile.tex

gives

\documentclass{article}

\begin{document}

Something%
\index{something}.

\end{document}

The above yaml instructs the text wrapping routine to stop at \index.

option 2 (using the replacement switch)

modifyLineBreaks:
  textWrapOptions:
    columns: 70

replacements:
  -
    when: after
    substitution: |-
        s/([a-zA-Z])\h+(\\index)/$1$2/sg

and then calling

latexindent.pl -rv -l -m myfile.tex

gives

\documentclass{article}

\begin{document}

Something\index{something}.%

\end{document}

The replacement switch is used here to remove spaces between letters ([a-zA-Z]) and the \index command.

marcin-serwin commented 2 years ago

Thanks for the quick answer and the proposed solutions. My point was a bit more general though. While it is true that replacing newlines by spaces is a reasonable behavior in most cases, this is not the case if the line is ended by comment. This tells LaTeX that next newline should be ignored, so the latexinndent transformation changes document, not just the source code formatting. While the proposed solutions work fine for lines starting with \index command they require to specify exceptions case by case. In general they wouldn't for example differentiate between these two situations:

foo%
bar

and

foo % this is a comment
bar

which communicate different intent on the part of the author.

cmhughes commented 2 years ago

So what about

\begin{myenv} 
foo%
bar
\end{myenv} 

Which changes to

begin{myenv} 
     foo%
     bar
\end{myenv} 

Again, this will change the output....

marcin-serwin commented 2 years ago

Spaces at the beginning of the line are ignored by LaTeX (unless it is verbatim environment, which can be controlled in localSettings) so it wouldn't change the output.

cmhughes commented 2 years ago

OK, understood. I'll explore this.

cmhughes commented 2 years ago

For my reference :split at comment regex, test for trailing space.

cmhughes commented 2 years ago

As of https://github.com/cmhughes/latexindent.pl/commit/079b219f42c17073325ca56ab8308f83201a2294 the text wrap routine has been updated so that starting with

\documentclass{article}

\begin{document}

Something%
\index{something}.

\end{document}

and using your YAML settings, gives

\documentclass{article}

\begin{document}

Something\index{something}.%

\end{document}

I'll be getting this released and uploaded to ctan.

Thanks for highlighting this!

cmhughes commented 2 years ago

I've released this at https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17.3 and uploaded it to ctan :) Thanks again!

cmhughes commented 2 years ago

For reference, dedicated part of the documentation

https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html#text-wrap-trailing-comments-and-spaces