Daniel-Diaz / HaTeX

The Haskell LaTeX library.
BSD 3-Clause "New" or "Revised" License
199 stars 46 forks source link

`takeTill` performs exponential backtracking in environments #122

Open isovector opened 5 years ago

isovector commented 5 years ago

parseLaTeX performs exponential amounts of work in the number of environments. I have two files, book2.txt and book3.txt which are roughly the same size in bytes, but the latter has ~3x as many environments. book2.txt parses quickly, but I haven't been patient enough to get book3.txt to.

The shiatsu program below is doing nothing more than reading the file and calling parseLaTeX.

➜  /tmp/ebook$ wc book2.txt
  6347  33541 245180 book2.txt

➜  /tmp/ebook$ cat book2.txt | grep "begin" | wc -l
232

➜  /tmp/ebook$ time shiatsu +RTS -p -RTS book2.txt - &> /dev/null
shiatsu +RTS -p -RTS book2.txt - &> /dev/null  2.23s user 0.10s system 99% cpu 2.343 total
➜  /tmp/ebook$ wc book3.txt                                             
  8385  41123 273877 book3.txt

➜  /tmp/ebook$ cat book3.txt | grep "begin" | wc -l
583

➜  /tmp/ebook$ time shiatsu +RTS -p -RTS book3.txt - &> /dev/null
^C
shiatsu +RTS -p -RTS book3.txt - &> /dev/null  43.29s user 0.33s system 99% cpu 43.780 total

Here's the profile information


COST CENTRE      MODULE                 SRC                                         %time %alloc

takeTill         Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:354:1-48           74.5   76.6
peekChar         Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:348:1-62            5.2    4.2
cmdArg           Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(222,1)-(232,38)    4.9    4.7
dolMath          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(304,1)-(312,7)     2.2    2.7
envBody.endenv   Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:189:9-75            2.0    2.0
latexBlockParser Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(116,1)-(123,5)     1.7    1.0
isSpecial        Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:335:1-29            1.5    0.0
cmdArgs          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(217,1)-(219,30)    1.4    1.4
envName          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(180,1)-(185,21)    1.2    1.5
text2            Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(144,1)-(147,35)    0.8    1.0
env              Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(163,1)-(177,22)    0.8    1.1
comment          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(323,1)-(328,23)    0.7    1.1

                                                                                                                                   individual      inherited
COST CENTRE                          MODULE                  SRC                                               no.      entries  %time %alloc   %time %alloc

  parseLaTeX                         Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:96:1-45                 2793          0    0.0    0.0   100.0  100.0
   parseLaTeXWith                    Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(99,1)-(101,63)         2794          1    0.0    0.0   100.0  100.0
    latexParser                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:112:1-57                2796          0    0.0    0.0   100.0  100.0
     latexBlockParser                Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(116,1)-(123,5)         2799          0    1.6    1.0   100.0  100.0
      text                           Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(131,1)-(138,71)        2801          0    0.2    0.2    76.5   75.5
       peekChar                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:348:1-62                2803          0    3.4    2.5     3.4    2.5
       takeTill                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:354:1-48                2808          0   67.3   66.8    72.9   72.8
        verbatimEnvironments         Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:72:5-24                 3124      14881    0.0    0.0     0.0    0.0
isovector commented 5 years ago

Actually, I think this might be mostly my fault. Here's one of the environments I was trying to expand:

\begin{code}
main :: IO ()
main = do
  bool <- read
  withSomeSBool (toSBool bool) $  !\annotate{1}!
    \(_ :: SBool b) ->  !\annotate{2}!
      runLogging @b program  !\annotate{3}!
\end{code}

which is a \VerbatimEnvironment in latex. I hadn't added it to the list of verbatim environments in hatex---which meant a lot of backtracking was going into this, even though it could never succeed.

Daniel-Diaz commented 5 years ago

What is the result if you use the following parser configuration?

ParserConf { verbatimEnvironments = ["verbatim", "code"] }

The problem might be with the \(_ :: SBool b) bit. The parser might think \( is the beginning of a mathematical expression.

isovector commented 5 years ago

It works as expected with the given ParserConf.

Daniel-Diaz commented 5 years ago

Great! I'm glad that worked for you.

Is the example code above enough to break the parser using default configuration?

isovector commented 5 years ago

The example code above does break the parser as expected. However, in long documents, this broken parser seems to result in the backtracking issue described above---it will keep consuming input outside of the code environment trying to fix the parse.