cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Text wrapping seems inconsistent #356

Closed cdelledonne closed 2 years ago

cdelledonne commented 2 years ago

Hi, thanks a lot for this great package, I find it extremely useful :)

I'm facing some inconsistent behavior with text wrapping enabled. It seems that some environments are processed, while some others aren't, and even those that are processed don't return the expected output.

EDIT: forgot to mention that I'm using V3.17.

Configuration

indentPreamble: 1
defaultIndent: "  "
removeTrailingWhitespace: 1
noAdditionalIndent:
  abstract: 1
modifyLineBreaks:
  preserveBlankLines: 1
  condenseMultipleBlankLinesInto: 1
  textWrapOptions:
    columns: 80

Example 1

It looks like items in an enumerate environment are somewhat wrapped, but not by the amount of columns specified in the config.

Input

\begin{enumerate}
\item This is a very long sentence, which is mostly likely going to be longer than eighty characters---and it is.
\item This is a very long sentence, which is mostly likely going to be longer than eighty
characters---in fact, it is. But it's wrapped on one hundred characters, yes 100.
\end{enumerate}

Output

\begin{enumerate}
  \item This is a very long sentence, which is mostly likely going to be longer than     % <-- 84 chars
        eighty characters---and it is.
  \item This is a very long sentence, which is mostly likely going to be longer than     % <-- 84 chars
        eighty characters---in fact, it is. But it's wrapped on one hundred characters,  % <-- 87 chars
        yes 100.
\end{enumerate}

Example 2

The caption of a figure doesn't seem to be wrapped at all.

Input

\begin{figure}
\centering
\includegraphics[width=\linewidth]{figure.pdf}
\caption[]{This is a long caption that spans multiple lines and is not wrapped by eighty characters,
    but one hundred. This is a long caption that spans multiple lines and is not wrapped by eighty
    characters, but one hundred.}
\end{figure}

Output

\begin{figure}
  \centering
  \includegraphics[width=\linewidth]{figure.pdf}
  \caption[]{This is a long caption that spans multiple lines and is not wrapped by eighty characters,
    but one hundred. This is a long caption that spans multiple lines and is not wrapped by eighty
    characters, but one hundred.}
\end{figure}

Example 3

Other environments, like enumerate* provided by the package enumitem, are not even indented properly. This perhaps should fall under a different issue though.

Input

\begin{enumerate*}
\item This is a very long sentence, which is mostly likely going to be longer than eighty characters---and it is.
\item This is a very long sentence, which is mostly likely going to be longer than eighty
characters---in fact, it is. But it's wrapped on one hundred characters, yes 100.
\end{enumerate*}

Output

\begin{enumerate*}
  \item This is a very long sentence, which is mostly likely going to be longer than
  eighty characters---and it is.
  \item This is a very long sentence, which is mostly likely going to be longer than
  eighty characters---in fact, it is. But it's wrapped on one hundred characters,
  yes 100.
\end{enumerate*}
cmhughes commented 2 years ago

Thanks for this. I'll go through each example.

example 1

The text wrapping routine happens before code blocks have been found (see https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html#text-wrapping) which means that, once indentation has occurred, it is possible to exceed the value of columns. This cannot be changed. I think that the documentation used to state this, but it looks like I may have (mistakenly) removed this. I'll update the documentation.

example 2

You need to tell latexindent.pl that you want text wrap blocks to follow the caption command; this is detailed in, for example, https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html#lst-tw-bf-myenv-yaml. If you use the following yaml

indentPreamble: 1
defaultIndent: "  "
removeTrailingWhitespace: 1
noAdditionalIndent:
  abstract: 1
modifyLineBreaks:
  preserveBlankLines: 1
  condenseMultipleBlankLinesInto: 1
  textWrapOptions:
    columns: 80
    blocksFollow:
           other: |-
             (?x)
                \\\]
                |
                \\item(?:\h|\[)
                |
                \\caption\h*(?:\[\])?\h*\{ # <--- new bit

and then call

latexindent.pl -l -m myfile.tex

then you receive

\begin{figure}
  \centering
  \includegraphics[width=\linewidth]{figure.pdf}
  \caption[]{This is a long caption that spans multiple lines and is not wrapped by eighty
    characters, but one hundred. This is a long caption that spans multiple lines
    and is not wrapped by eighty characters, but one hundred.}
\end{figure}

The new bit is

                \\caption\h*(?:\[\])?\h*\{ # <--- new bit

which you can read as '\caption followed by optional horizontal space followed by optional square brackets followed by optional horizontal space followed definitely by a curly brace'

example 3

You need to tell latexindent.pl to look for items within enumerate*, as in the following yaml

indentPreamble: 1
defaultIndent: "  "
removeTrailingWhitespace: 1
noAdditionalIndent:
  abstract: 1
modifyLineBreaks:
  preserveBlankLines: 1
  condenseMultipleBlankLinesInto: 1
  textWrapOptions:
    columns: 80
    blocksFollow:
           other: |-
             (?x)
                \\\]
                |
                \\item(?:\h|\[)
                |
                \\caption\h*(?:\[\])?\h*\{ # <--- new bit

indentAfterItems:                          # <--- new bit
    enumerate*: 1                           # <--- new bit

this then gives

\begin{enumerate*}
  \item This is a very long sentence, which is mostly likely going to be longer than
        eighty characters---and it is.
  \item This is a very long sentence, which is mostly likely going to be longer than
        eighty characters---in fact, it is. But it's wrapped on one hundred characters,
        yes 100.
\end{enumerate*}

I'll update defaultSettings.yaml to get enumerate* included for future releases.

cdelledonne commented 2 years ago

Thank you for your prompt response.

Regarding Example 3: makes sense, and thanks for adding that to the default behaviour.

Regarding Example 2: is there a reason why not all text/commands are wrapped by default? As a user, if I specify textWrapOptions, I would expect all the regular text to be wrapped, perhaps with some option to *exclude* certain commands. Maybe I'm missing something, but I don't see why a common command like caption should not be considered for wrapping.

Regarding text wrapping in general: you say "The text wrapping routine happens before code blocks have been found", but in Example 1 the lines exceed the specified column value by more than the length of the indent. That is, I'm choosing to use 2 spaces as indent and yet the second line is longer than 80 + 2 (it's 84).

You also say "This cannot be changed". Is there a fundamental reason why this can't change, or is it just the way it's implemented? It's kind of unfortunate because the result is not what one would expect, thus making text wrapping only partially useful.

Anyway, these are not critiques, I'm just being curious. Thanks a lot for your patience!

cmhughes commented 2 years ago

Thanks for this.

In previous versions I tried to anticipate the needs of every user when it came to text wrapping. It didn't go well, so now each user configures the things they want wrapped.

Indentation is, in my opinion, a difficult thing to get right. Text wrapping after indentation could mess it up.

Text wrapping can exceed the specified value of columns in your example because of the width of the item command.

Feel free to dive into the code and try experimenting. It quickly becomes apparent how intricate it is. If you find something that does it better, let me know :)

On Thu, 31 Mar 2022, 16:49 cdelledonne, @.***> wrote:

Thank you for your prompt response.

Regarding Example 3: makes sense, and thanks for adding that to the default behaviour.

Regarding Example 2: is there a reason why not all text/commands are wrapped by default? As a user, if I specify textWrapOptions, I would expect all the regular text to be wrapped, perhaps with some option to exclude certain commands. Maybe I'm missing something, but I don't see why a common command like caption should not be considered for wrapping.

Regarding text wrapping in general: you say "The text wrapping routine happens before code blocks have been found", but in Example 1 the lines exceed the specified column value by more than the length of the indent. That is, I'm choosing to use 2 spaces as indent and yet the second line is longer than 80 + 2 (it's 84).

You also say "This cannot be changed". Is there a fundamental reason why this can't change, or is it just the way it's implemented? It's kind of unfortunate because the result is not what one would expect, thus making text wrapping only partially useful.

Anyway, these are not critiques, I'm just being curious. Thanks a lot for your patience!

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/356#issuecomment-1084772212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYA47HJMQNRGTU4UAI3VCXCQ5ANCNFSM5R7RIPSQ . You are receiving this because you commented.Message ID: @.***>

cmhughes commented 1 year ago

Some good news! :)

As of https://github.com/cmhughes/latexindent.pl/commit/a54b7be45606d07518dd86308212852111710991 I've upgraded the text wrap routine.

demonstration

Starting with

\begin{enumerate}
\item This is a very long sentence, which is mostly likely going to be longer than eighty characters---and it is.
\item This is a very long sentence, which is mostly likely going to be longer than eighty
characters---in fact, it is. But it's wrapped on one hundred characters, yes 100.
\end{enumerate}

and the settings

defaultIndent: "  "
modifyLineBreaks:
  textWrapOptions:
    columns: 80
    when: after    #<!------- NEW BIT

gives the output

\begin{enumerate}
  \item This is a very long sentence, which is mostly likely going to be longer
        than eighty characters---and it is.
  \item This is a very long sentence, which is mostly likely going to be longer
        than eighty characters---in fact, it is. But it's wrapped on one
        hundred characters, yes 100.
\end{enumerate}
----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
   5   10   15   20   25   30   35   40   45   50   55   60   65   70   75   80   85   90

You'll notice that the text wrapping can now respect columns and indentation! This will be part of the next release, coming soon in early 2023, hopefully :)