cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
867 stars 84 forks source link

oneSentencePerLine wrecks itemize #463

Closed dpo closed 1 year ago

dpo commented 1 year ago

Please provide the following when posting an issue:

original .tex code

We compare
\begin{itemize}
  \item R2
  \item TR-R2 (maybe?)
  \item TRDH
\end{itemize}

yaml settings

modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1

actual/given output

We compare \begin{itemize} \item R2 \item TR-R2 (maybe?
        )
  \item TRDH
\end{itemize}

desired or expected output

Not sure, but I am not expecting the itemize environment to be merged with the first line and two \items to end up on the same line. I understand that the ? is taken as the end of a sentence, and that a line break is added. Maybe I would expect something like

We compare
\begin{itemize}
  \item R2
  \item TR-R2 (maybe?
  )
  \item TRDH
\end{itemize}

anything else

With all default settings, the output is

We compare
\begin{itemize}
        \item R2
        \item TR-R2 (maybe?)
        \item TRDH
\end{itemize}

and so I don't expect oneSentencePerLine to wreck something that wasn't wrecked before. What is happening here?

dpo commented 1 year ago

I presume this issue is addressed here: https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html?highlight=itemize#lst-manipulate-sentences-yaml

I apologize for raising issues with a documented solution; the documentation is extensive but (to me) intricate to navigate.

I came up with the following to try and account for parenthetical sentences as well:

modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1
        removeSentenceLineBreaks: 0
        sentencesBeginWith:            
            A-Z: 0
            a-z: 0
            other: "\([A-Z]|[A-Z]"
        sentencesEndWith:
            basicFullStop: 0
            betterFullStop: 0
            exclamationMark: 0
            questionMark: 0
            other: "\.\)|\?\)|\!\)|\. |\? |\! "

Result:

We compare
\begin{itemize}
        \item R2
        \item TR-R2 (maybe?)
        \item TRDH
\end{itemize}

but, surprisingly, this solution has nothing to do with the itemize environment! Unfortunately, removeSentenceLineBreaks: 0 would have been useful elsewhere in the document. I'm not sure if it's possible to have it both ways.

cmhughes commented 1 year ago

Thanks for this.

They key thing is to think about what a sentence is. Your settings say that a sentence can start with an uppercase letter, and can finish with a question mark.

There is not currently a way to say 'sentence does not contain' but it's on the list at https://github.com/cmhughes/latexindent.pl/issues/419 which I'm hoping to get to soon.

In the meantime you can either tweak the sentence routine or otherwise set some poly switches as in https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html#lst-multiple-sentences4-mod3

Does this help?

dpo commented 1 year ago

They key thing is to think about what a sentence is. Your settings say that a sentence can start with an uppercase letter, and can finish with a question mark.

Well, they say more than that, right? A sentence can also start with a left paren followed by a capital letter, and can end with a full stop/question mark/exclamation mark, followed by either a right paren or a space.

cmhughes commented 1 year ago

Here are some settings for you to consider/explore:

attempt 1

using the settings

modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1
    items:
        ItemStartsOnOwnLine: 1
    environments:
        BeginStartsOnOwnLine: 1
        BodyStartsOnOwnLine: 1
        EndStartsOnOwnLine: 1
        EndFinishesWithLineBreak: 1

gives

We compare
\begin{itemize}
    \item R2
    \item TR-R2 (maybe?
          )
    \item TRDH
\end{itemize}

which isn't quite as we would like. So, to attempt 2.

attempt 2

We customise sentencesEndWith field (see https://latexindentpl.readthedocs.io/en/latest/sec-the-m-switch.html#onesentenceperline-sentencesendwith) as follows

modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1
        sentencesEndWith:
            questionMark: 0                 # 0/1
            other: |-
              (?x)
              (?:          #
                 \?\)?     # ?) OR ?
              )
    items:
        ItemStartsOnOwnLine: 1
    environments:
        BeginStartsOnOwnLine: 1
        BodyStartsOnOwnLine: 1
        EndStartsOnOwnLine: 1
        EndFinishesWithLineBreak: 1

We have turned off the basic question mark and used a more sophisticated version

       other: |-
              (?x)
              (?:          #
                 \?\)?     # ?) OR ?
              )

Let's go through this:

cmhughes commented 1 year ago

As of https://github.com/cmhughes/latexindent.pl/commit/167c88ef353ce5bdc2266eb4ff205b1d66715aef I've implemented a new feature sentencesDoNOTcontain.

The default is

        sentencesDoNOTcontain:
            other: \\begin                  # regex

and so the default output is now as you request

We compare
\begin{itemize}
    \item R2
    \item TR-R2 (maybe?)
    \item TRDH
\end{itemize}

I'll get this released soon.

cmhughes commented 1 year ago

Implemented at https://github.com/cmhughes/latexindent.pl/releases/tag/V3.23