WGUNDERWOOD / tex-fmt

An extremely fast LaTeX formatter written in Rust
MIT License
272 stars 20 forks source link

[Bug] Indentation error with unmatched parentheses outside of math environment #28

Open akern40 opened 2 months ago

akern40 commented 2 months ago

This package is excellent! I did run into an indentation bug, though: unmatched parentheses outside of a math environment cause a "dedent" (I think), which mucks up the indentation count for the rest of the document. As a minimal example, this correctly-formatted document:

\documentclass{article}

\begin{document}

I have encountered a bug in the otherwise-excellent \texttt{tex-fmt} package.
There are two issues: 1) unmatched parenthesis cause the indentation count to go negative for a given line, and 2) that count then gets carried over to the rest of the document.
\begin{enumerate}
    \item The result is that items that should be indented, like this one, get \texit{un}indented.
\end{enumerate}

\end{document}

gets incorrectly formatted by tex-fmt

\documentclass{article}

\begin{document}

I have encountered a bug in the otherwise-excellent \texttt{tex-fmt} package.
There are two issues: 1) unmatched parenthesis cause the indentation count to go negative for a given line, and 2) that count then gets carried over to the rest of the document.
\begin{enumerate}
\item The result is that items that should be indented, like this one, get \texit{un}indented.
\end{enumerate}

\end{document}

with the following output from --trace:

❯ tex-fmt --keep -t '.\test.tex'
INFO: tex-fmt test.tex: Formatting started.
INFO: tex-fmt test.tex: Ensuring environments on new lines.
TRACE: tex-fmt test.tex: Line 0 (0). Indent: actual = 0, visual = 0: \documentclass{article}
TRACE: tex-fmt test.tex: Line 1 (1). Indent: actual = 0, visual = 0:
TRACE: tex-fmt test.tex: Line 2 (2). Indent: actual = 0, visual = 0: \begin{document}
TRACE: tex-fmt test.tex: Line 3 (3). Indent: actual = 0, visual = 0:
TRACE: tex-fmt test.tex: Line 4 (4). Indent: actual = 0, visual = 0: I have encountered a bug in the otherwise-excellent \texttt{tex-fmt} package.
TRACE: tex-fmt test.tex: Line 5 (5). Indent: actual = -2, visual = -2: There are two issues: 1) unmatched parenthesis cause the indentation count to go negative for a given line, and 2) that count then gets carried over to the rest of the document.
WARN: tex-fmt test.tex: Line 6 (5). Indent is negative. There are two issues: 1) unmatched parenthesis cause the indentation count to go negative for a given line, and 2) that count then gets carried over to the rest of the document.
TRACE: tex-fmt test.tex: Line 6 (6). Indent: actual = 0, visual = -2: \begin{enumerate}
WARN: tex-fmt test.tex: Line 7 (6). Indent is negative. \begin{enumerate}
TRACE: tex-fmt test.tex: Line 7 (7). Indent: actual = 0, visual = -1: \item The result is that items that should be indented, like this one, get \texit{un}indented.
WARN: tex-fmt test.tex: Line 8 (7). Indent is negative. \item The result is that items that should be indented, like this one, get \texit{un}indented.
TRACE: tex-fmt test.tex: Line 8 (8). Indent: actual = -2, visual = -2: \end{enumerate}
WARN: tex-fmt test.tex: Line 9 (8). Indent is negative. \end{enumerate}
TRACE: tex-fmt test.tex: Line 9 (9). Indent: actual = -2, visual = -2:
WARN: tex-fmt test.tex: Line 10 (9). Indent is negative.
TRACE: tex-fmt test.tex: Line 10 (10). Indent: actual = -2, visual = -2: \end{document}
WARN: tex-fmt test.tex: Line 11 (10). Indent is negative. \end{document}
INFO: tex-fmt test.tex: Formatting complete.
WGUNDERWOOD commented 2 months ago

Thanks for catching this! This bug makes sense to me, and I have encountered it before. As a temporary workaround, I recommend skipping the offending lines by appending % tex-fmt: skip. A good general solution seems quite difficult: consider the following document.

\documentclass{article}

\begin{document}

Consider this (I have a long sentence in parentheses
  over multiple lines, which should indent
  appropriately, but also have numbered items
  1) number one and 2) number two.
).

\end{document}

Let me know if you have any ideas for solving these types of issues: a perfect solution is probably impossible but any improvements are welcome.

akern40 commented 2 months ago

Oh ya skipping is a good idea I hadn't thought of 😅

Personally I'm not sure I'd want to have any indentation caused by parentheses in non-math settings? I think this is how latexindent treats it. But I can think of two schemes or heuristics that could help. I'm not super familiar with the code, so this could be far-fetched.

The simple heuristic is that close parentheses which appear after numbers, single letters, or Roman numerals are treated as non-matching.

The more complex scheme I imagine a bit like a state machine: when it finds an open parenthesis, it enters state Open. Finding a close parenthesis triggers state MaybeClosed(1). Another close parenthesis triggers MaybeClosed(2), etc. This state persists until the program either a) finds an open parenthesis or b) hits the end of the document. At that point, the stack unwinds: the final MaybeClosed is treated as a matching parenthesis, and all intermediates are treated as non-indenting.

Like you said, I think there is no "true" solution; the problem is definitionally ambiguous, which is why I'd kinda suggest sidestepping it entirely.

(Also: thank you for open source work! Don't want to be the guy who suggests a complex solution and doesn't offer implementation help. I'll just skip the line for now.)

WGUNDERWOOD commented 1 month ago

I think that the implementation with Open and MaybeClosed will fail if the last closing parenthesis is not the matching one. Consider

This sentence (which I am writing right
here) has two parts: 1) the first part,
and 2) the second part.

This would receive indentation even though it should not. The option to simply ignore all parentheses outside maths mode could be good; I will look into an implementation of this if I have some time.

akern40 commented 1 month ago

Ah yep good counterexample - ignoring would be great!

WGUNDERWOOD commented 1 month ago

Linking #18 here, as some of the logic for detecting math mode could be useful.

BlakeFreer commented 1 month ago

I have a similar problem when using a), b) style enumerate items

\begin{enumerate}[label=\alph*)]  % this ) causes indentation to fail
\item foo                         % I expect this line to be indented
\end{enumerate}

I'm mentioning since it may be easy to fix this special case since the context of the offending ) is quite obvious