cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
864 stars 84 forks source link

[latexindent.exe] manipulateSentences cannot be 1 in versions after 3.22 #514

Closed NilsonPark closed 4 months ago

NilsonPark commented 6 months ago

Please provide the following when posting an issue:

original .tex code

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{microtype}

% Package for citations
\usepackage{natbib}

\begin{document}

\title{Sample Document for LaTeX Testing}
\author{Your Name}
\date{\today}

\maketitle

\section{Introduction}
This is a sample LaTeX document for testing purposes.
It includes basic text formatting and citation examples.
The use of LaTeX enables the creation of professionally formatted documents, such as academic papers, theses, and reports.
According to \citet{Einstein}, energy and mass are equivalent, as described by the equation \(E=mc^2\).
This groundbreaking theory revolutionized our understanding of physics.
Another significant work in physics is by \citet{Newton}, who formulated the laws of motion and universal gravitation.

\bibliographystyle{plainnat}
\bibliography{main}

\end{document}

yaml settings

onlyOneBackUp: 1
maxNumberOfBackUps: 1
modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1 # perform the sentence manipulation routine
        removeSentenceLineBreaks: 0 # keep existing sentence line breaks

The file is indented correctly. But there are logs in the terminal as follows

Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?x)                                # ignore spaces in the below
(?:                                 #
  \.\)                              # .)
  (?!\h*[a-z])                      # not *followed by* a-z
)                                   #
|                                   # OR
(?:                                 #
  (?<!                              # not *preceded by*
    (?:                             #
      (?:[eE]\.[gG])                # e.g OR E.g OR e.G OR E.G
      |                             #
      (?:[iI]\.[eE])                # i.e OR I.e OR i.E OR I.E
      |                             #
      (?:etc)                       # etc
      |                             #
      (?:[wW]\.[rR]\.[tT])          # w.r.t OR W.r.t OR w.R.t OR w.r.T OR W.R.t OR W.r.T OR w.R.T OR W.R.T
    )                               #
  )                                 #
)                                   #
\.                                  # .
(?!                                 # not *followed by*
  (?:                               #
    [a-zA-Z0-9-~,]                  #
    |                               #
    \),                             # ),
    |                               #
    \)\.                            # ).
  )                                 #
)                                   # <-- HERE / at C:\Users\---------hide-----------------\inc\lib/LatexIndent/Sentence.pm line 160.
Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?^:(?x)                                # ignore spaces in the below
(?:                                 #
  \.\)                              # .)
  (?!\h*[a-z])                      # not *followed by* a-z
)                                   #
|                                   # OR
(?:                                 #
  (?<!                              # not *preceded by*
    (?:                             #
      (?:[eE]\.[gG])                # e.g OR E.g OR e.G OR E.G
      |                             #
      (?:[iI]\.[eE])                # i.e OR I.e OR i.E OR I.E
      |                             #
      (?:etc)                       # etc
      |                             #
      (?:[wW]\.[rR]\.[tT])          # w.r.t OR W.r.t OR w.R.t OR w.r.T OR W.R.t OR W.r.T OR w.R.T OR W.R.T
    )                               #
  )                                 #
)                                   #
\.                                  # .
(?!                                 # not *followed by*
  (?:                               #
    [a-zA-Z0-9-~,]                  #
    |                               #
    \),                             # ),
    |                               #
    \)\.                            # ).
  )                                 #
)                                   #
)|(?^:\?)|(?^:!) <-- HERE / at C:\Users\---------hide-----------------\inc\lib/LatexIndent/Sentence.pm line 183.
Variable length lookbehind is experimental in regex; marked by <-- HERE in m/((?:(?^usx:(?:(?^sx:
                        (?:\A(?:LTXIN-TK-blank-line\R)+)     # the order of each of these
                                |                            # is important, as (like always) the first
                        (?:\G(?:LTXIN-TK-blank-line\R)+)     # thing to be matched will
                                |                            # be accepted
                        (?:(?:LTXIN-TK-blank-line\h*\R)+)
                                |
                                \R{2,}
                                |
                                \G
                        )|(?^s:\!)|(?^s:\})|(?^us:(?^u:(?^u:(?<!\\))%latexindenttrailingcomment\d+-END)\h*\R)|(?^s:\?)|(?^s:\R?\\par)|(?^s:\.))(?:\h|\R)*)))
                            (\h*)
                            (?!(?^u:(?^u:(?<!\\))%latexindenttrailingcomment\d+-END)|(?^s:(?:\R?\\par)))
                            ((?:(?^:(?^:(?!(?:LTXIN-TK-blank-line|LTXIN-TK-VERBATIM|LTXIN-TK-preamble))[A-Z]))).*?)
                            ((?^:(?^:(?x)                                # ignore spaces in the below
(?:                                 #
  \.\)                              # .)
  (?!\h*[a-z])                      # not *followed by* a-z
)                                   #
|                                   # OR
(?:                                 #
  (?<!                              # not *preceded by*
    (?:                             #
      (?:[eE]\.[gG])                # e.g OR E.g OR e.G OR E.G
      |                             #
      (?:[iI]\.[eE])                # i.e OR I.e OR i.E OR I.E
      |                             #
      (?:etc)                       # etc
      |                             #
      (?:[wW]\.[rR]\.[tT])          # w.r.t OR W.r.t OR w.R.t OR w.r.T OR W.R.t OR W.r.T OR w.R.T OR W.R.T
    )                               #
  )                                 #
)                                   #
\.                                  # .
(?!                                 # not *followed by*
  (?:                               #
    [a-zA-Z0-9-~,]                  #
    |                               #
    \),                             # ),
    |                               #
    \)\.                            # ).
  )                                 #
)                                   #
)|(?^:\?)|(?^:!)))
                            (\h*)?                        # possibly followed by horizontal space
                            (\R)?                         # possibly followed by a line break
                            ((?^u:(?^u:(?<!\\))%latexindenttrailingcomment\d+-END))?     # possibly followed by trailing comments
                        <-- HERE / at C:\Users\---------hide-----------------\inc\lib/LatexIndent/Sentence.pm line 217.

If I disable manipulateSentences: 0, there is not log. If I remove the modification added in #447, there is also no log.

cmhughes commented 6 months ago

What version of perl are you using? What's your operating system?

NilsonPark commented 6 months ago

I am using the standalone latexindent.exe on Windows 11. This issue is general for any tex files if manipulateSentences: 1.

cmhughes commented 6 months ago

I can confirm this is a bug, demonstrated in github actions log at https://github.com/cmhughes/latexindent.pl/actions/runs/7863164391/job/21453471349

I'll look into it.

Konfekt commented 5 months ago

Similarly,

cat test.tex |latexindent --modifylinebreaks  --yaml="modifyLineBreaks:oneSentencePerLine:manipulateSentences:1"

for latexindent 3.23.6 yields

Variable length lookbehind not implemented in regex m/(?x)                                # ignore spaces in the below
(?:                                 #
\.\)          .../ at /usr/local/texlive/2023/texmf-dist/scripts/latexindent/LatexIndent/Sentence.pm line 160, <> line 34.

with exit code 128

Konfekt commented 5 months ago

Variable length lookbehind not implemented in regex m/(?x)

Is this just a question of a recent perl version (> 5.26 in this case)? As far as I can see, for example, texlive does not ship a perl executable, so that this depends on a recent version on the system.

Konfekt commented 4 months ago

In version 3.23.8 the error persists, but I suspect this is rather due to an insufficiently recent Perl version?

Konfekt commented 4 months ago

There's a mention that is developed on 5.38 but it is not clear if that's the minimal requirement (Opensuse ships with 5.26, maybe cause of this failure).

cmhughes commented 4 months ago

In version 3.23.8 the error persists, but I suspect this is rather due to an insufficiently recent Perl version?

Yes, this is expected, V3.23.8 wasn't designed to fix this issue.

I'm hoping to get to this as my next latexindent priority.

Konfekt commented 4 months ago

Okay, thank you very much for the clarification. It was somewhat in a limbo since a commit addressed this issue, though it's still open

cmhughes commented 4 months ago

Okay, thank you very much for the clarification. It was somewhat in a limbo since a commit addressed this issue, though it's still open

Apologies for the confusion, I can see how/why that happened.

cmhughes commented 4 months ago

Thanks for this.

I believe that, as of https://github.com/cmhughes/latexindent.pl/commit/55306bcd77f713abef53c1240b3fd867296203dd I've fixed this.

The problem was that I had specified an old version of Perl (5.32) in the routine that creates latexindent.exe. I've fixed this as of the above.

You can see an output of the tests in github actions at https://github.com/cmhughes/latexindent.pl/actions/runs/8582320265/job/23520314375 (this log may expire at some point).

This will be part of the next release, please leave this issue open until I've made the release.

Thanks for reporting.

cmhughes commented 4 months ago

Released at https://github.com/cmhughes/latexindent.pl/releases/tag/V3.23.9, uploaded to ctan