brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
928 stars 98 forks source link

Deep recursion with Tikz #560

Closed bernhard-kleine closed 9 years ago

bernhard-kleine commented 9 years ago

In #550 I showed an example which did well compile and showed no deep recursion. However with the example below I encounter more than thousands warnings, and the time of compilation increases dramatically. For practical reasons this should be solved. The file to run:

\documentclass[12pt,a4paper]{book}
\makeatletter
\usepackage[ngerman,english]{babel}
\usepackage[utf8]{inputenc}

\selectlanguage{english}
\usepackage{calc}
\usepackage{tabularx}
\usepackage{rotating}% für gedrehte Tabelle etc.
\usepackage{listings}
\usepackage[footnote]{acronym}
\usepackage{array}
\usepackage{multicol}
\usepackage{upgreek}
\renewcommand{\ttdefault}{pcr}
\usepackage{makeidx}
\usepackage{pdflscape}
\usepackage{url}
\usepackage{color}
\usepackage {caption}
\usepackage{hyperref}
\usepackage{textcomp}
\input{phylotree_tikz}
\makeatother
\begin{document}
   \phylotree{Insecta,Crustacea,Mollusca}
\end{document}

and the tikz tree (phylotree_tikz.tex which should be in the same directory)


%****************** TIKZ und Steroide

%****************** Phylogenetic Tree
\usepackage{tikz}%                  für Phylotree-
\usetikzlibrary{trees}%                 für Phylotree
\usetikzlibrary{arrows}
% set default values
\tikzset{
Prokaryota/.style=,
Eukaryota/.style=,
Porifera/.style=,
Vertebrata/.style=,
Acrania/.style=,
Agnatha/.style=,
Dipnoi/.style=,
Actinistia/.style=,
Actinopterygii/.style=,
Chondrichthyes/.style=,
Amphibia/.style=,
Annelida/.style=,
Platyhelm/.style=,
Mollusca/.style=,
Chelicerata/.style=,
Nemathelm/.style=,
Myriapoda/.style=,
Insecta/.style=,
Crustacea/.style=,
Ctenophora/.style=,
Cnidaria/.style=,
Acrania/.style=,
Tunicata/.style=,
Hemicordata/.style=,
Echinodermata/.style=,
Amphibia/.style=,
Sauropsida/.style,
Mammalia/.style,
nosize/.style={inner sep=0pt,minimum size=0pt,outer sep=0pt},
}

\DeclareRobustCommand{\abbox}[1]{\makebox[0mm]{\raisebox{2mm}{#1}}}
\DeclareRobustCommand{\nullbox}[1]{\makebox[0mm][l]{#1}}
\DeclareRobustCommand\phylotree[1]{{
% \foreach would be the natural choice here, but styles set
% inside a foreach are local. Use a latex hack instead.
\@for\@myarg:=#1\do{\tikzset{\@myarg/.style={text=black,font={\tiny\bfseries}}}}
%
\begin{tikzpicture}[level distance=3mm,sibling distance=8mm,text=black!50,font=\tiny]
  \node {} [edge from parent fork right,grow=right]
  child {node[Prokaryota] {\nullbox{Prokaryota}}}
  child {[sibling distance=9mm] node[Eukaryota,nosize] {\abbox{Eukaryota}}%
    child {node[Porifera] {\nullbox{Porifera}}}
    child {[sibling distance=7mm] %Eumetazoa
      child {[level distance=3mm] node[Ctenophora] {\nullbox{Ctenophora}}}
      child {[level distance=3mm] node[Cnidaria] {\nullbox{Cnidaria}}}
      child {[sibling distance=9mm,level distance=3mm]%Bilateria%
        child {[sibling distance=4mm]%Deuterostomia
          child {%Cordata
            child {[sibling distance=3.5mm]%Notocordata
              child {%Vertebrata
                child {%Gnathostomata
                  child {%Osteognathostomata
                    child {[sibling distance=2.5mm]%Sarcopterygii
                      child {[sibling distance=2mm]%Tetrapoda
%                        child {[sibling distance=1.5mm]%Amniota
%                          child {node[Mammalia] {\nullbox{Mammalia}}}
%                          child {node[Sauropsida] {\nullbox{Sauropsida}}}
%                        }%Amniota
%                        child {node[Amphibia] {\nullbox{Amphibia}}}
                      }%Tetrapoda
                      child {node[Dipnoi] {\nullbox{Dipnoi}}} % <------ this is the first entry to raise warnings
%                      child {node[Actinistia] {\nullbox{Actinistia}}}
                    }%Sarcopterygii
                    child {node[Actinopterygii] {\nullbox{Actinopterygii}}}
                  }%Osteognathostomata
                  child {node[Chondrichthyes]{\nullbox{Chondrichthyes}}}
                }%Gnathostomata
                child {node[Agnatha] {\nullbox{Agnatha}}} }%Vertebrata
              child {node[Acrania] {\nullbox{Acrania}}} }%Notocordata
            child {node[Tunicata] {\nullbox{Tunicata}}} }%Cordata
          child {node[Hemicordata] {\nullbox{Hemichordata}}} 
          child {node[Echinodermata] {\nullbox{Echinodermata}}}
        }%Deuterostomia
        child {[sibling distance=3mm]%Protostomia
          child {node[Platyhelm] {\nullbox{Platyhelminthes}}}
          child {node[Mollusca] {\nullbox{Mollusca}}} child {[sibling distance=3mm]%Articulata
            child {node[Annelida] {\nullbox{Annelida}}} child
            {%Arthropoda
              child {node[Chelicerata] {\nullbox{Chelicerata}}}
              child {%leer1
                child {node[Nemathelm] {\nullbox{Nemathelminthes}}}
                child {%Mandibulata
                  child {node[Myriapoda] {\nullbox{Myriapoda}}}
                  child {[sibling distance=2mm]%Tetraconata
                    child {node[Insecta] {\nullbox{Insecta}}}
                    child {node[Crustacea] {\nullbox{Crustacea}}}
                  }%Tetraconata
                }%Mandibulata
              }%leer1
            }%Arthorpoda
          }%Articulata
        }%Protostomia
      }%Bilateria
    }%Eumetazoa
  };
\end{tikzpicture}
}}

The example should run with pdflatex, but throw error with latexml.

bernhard-kleine commented 9 years ago

I would like to add: if the tree is only five or six children deep, no error and warning it is only a question of the depth of the tree that these warnings occur. I have not tested with the intermediates when the deep recursions occur.

brucemiller commented 9 years ago

This is interesting: I hadn't paid that close attention to the details of recursion warnings in Perl till now. They really are just warnings, the computation isn't necessarily aborted or stopped. The standard setting for recursion depth warnings in perl is 100. You can turn off the warning (in typical Perl fashion: by putting no warnings 'recursion' in every source file!), but to change the depth at which the warning is issued (100 doesn't seem that large), you have to recompile Perl itself!

It seems that this warning isn't necessarily due to any "danger", for modern Perl & computers, but simply that usually this is a sign of unwanted recursion. Indeed, until now the only such warnings I've seen were due to a coding problem. Bernhard's example seems on the surface to be a legitimate need for recursion. Yet, I am very hesitant to turn off these warnings since they are usually quite helpful!

It could in fact be an indicator of non-optimal code in LaTeXML, where too much layering or indirection is used, but I doubt that would get rid of all the recursion warnings here.

So, my temptation is, regrettably, to mark this as wontfix. Perhaps @dginev has some insights?

Incidentally, the increase in computation time is due to the increase in computation, probably all the worse due to lots of recursion; LaTeXML is quite slow running tikz! The messages don't seem to a significant contribution to the runtime.

dginev commented 9 years ago

Indeed, I would like to investigate this independently, profiling in tandem with working on #480 . So I will self-assign. I don't mind leaving the issue open, but if you want to close that is fine as well, GitHub will keep them cross-referenced.

dginev commented 9 years ago

A good idea I have come across in discussions about Perl's recursion problems is as follows:

In the Tikz example here, the recursion happens on invokeToken, as several of the concrete invocation cases proceed to recursively invoke some other token (for one reason or another). Rather than doing the invocation directly, and thus deepen the subroutine call stack, we should unshift the newly found token in the gullet and return empty-handed all the way up to the digestNextBody while loop, that keeps polling the Gullet with readXToken. I am quite certain this will reduce LaTeXML's memory footprint (a subroutine call is the most expensive memory allocation there is, and deep recursion means having a lot of them allocated at the same time), but will cost us in performance, as we have extra shift/unshift calls on the gullet.

Optimizing the management and invocation of tokens is probably the highest impact performance enhancement for Tikz pictures.

bernhard-kleine commented 9 years ago

I have played with the tree and have a minimal tree that raises the warnings. Its only five nodes which are lacking and this shows once more that these warnings are nonsense. When you comment out the entry I have marked on the right , everything runs smoothly. There are only 38 warnings and they should give you an idea where to optimize the code.

dginev commented 9 years ago

Thanks, your tiny examples are actually extremely useful for profiling, so they are much appreciated.

I think I now have a pretty good idea where the inefficiencies lie for Tikz, the interesting ongoing question is how to address them. But I will try out some alternatives by the end of December.

brucemiller commented 9 years ago

Deyan will regret giving me clues, since I spent a bit of time digging around when I should have been preparing a release (& a conference, & papers & ...). But I did some rewrite & optimization in Stomach to reduce the layers when calling invokeToken. Reduced the 17K warnings to 5K, and reduced the runtime maybe 10%. I also tried to reduce redundant(?) lists and such. Still there are deep recursions in invokeToken and while reverting and absorbing boxes/lists/whatsits. I suspect a lot of it is simply legitimate depth & recursion, but there likely are still optimizations and improvements to be made, short of unrolling all the recursion by hand and making the code unmaintainable! :>

So, I'll leave the Issue open to look at some more later.

dginev commented 9 years ago

The point of maintenance releases is that we do interesting things after the release fixes the old problems we had. Then we can break things again. If 0.8.1 comes out crashed because of the optimization, it won't be my fault :>

dginev commented 9 years ago

Also, really cool news, the 10% speed increase sounds great already. I will still look at this later on (after 0.8.1 )

bernhard-kleine commented 9 years ago

I wonder, whether this bug will be resolved in the foreseeable future. In principle as I have evluated it it is a question, which deepness of the tikztree will be accepted. The error for me is that the last two or three levels of deepness are not supported. Since this increase compilation time extremely I would like you to remind of the problem.

brucemiller commented 9 years ago

I guess it depends on the reach of your foresight.

I think you're wrong that the "levels of deepness are supported"; although Perl whines like crazy about the recursion (lots of warnings, but no errors) --- and that is something we want to fix or at least further improve --- the compilation does complete, and even produces a nominally correct result. At least correct modulo #550, which certainly is a bug that also needs to be fixed.

bernhard-kleine commented 9 years ago

It takes hours to compile. That is an issue by itself.

Several thousands of this warnings.

Bernhard

Warning:perl:warn Perl warning

    at C:/Users/bk/Documents/BuchprojektSpringer/XMLtestordner/Chap5/feeding.tex; line 17 col 14

    Deep recursion on subroutine "LaTeXML::Core::Stomach::invokeToken" at d:/latexml/latexml-master/bin/../lib/LaTeXML/Core/Stomach.pm line 80, <$IN> line 17.

    In Core::Stomach[@0x4fbdbf8] at C:/Users/bk/Documents/BuchprojektSpringer/XMLtestordner/Chap5/feeding.tex; line 17 col 14. 

Von: bruce miller [mailto:notifications@github.com] Gesendet: Freitag, 1. Mai 2015 15:22 An: brucemiller/LaTeXML Cc: Bernhard Kleine Betreff: Re: [LaTeXML] Deep recursion with Tikz (#560)

I guess it depends on the reach of your foresight.

I think you're wrong that the "levels of deepness are supported"; although Perl whines like crazy about the recursion (lots of warnings, but no errors) --- and that is something we want to fix or at least further improve --- the compilation does complete, and even produces a nominally correct result. At least correct modulo #550 https://github.com/brucemiller/LaTeXML/issues/550 , which certainly is a bug that also needs to be fixed.

— Reply to this email directly or view it on GitHub https://github.com/brucemiller/LaTeXML/issues/560#issuecomment-98132254 . https://github.com/notifications/beacon/AIyh5zdrp-nYAHs2ba3uESKxmf-UOibZks5oE3VxgaJpZM4DAFJZ.gif

brucemiller commented 9 years ago

Yes, 5855 warnings, in fact. However, it only takes slightly over 2 minutes to compile on my, fairly old, system.

bernhard-kleine commented 9 years ago

I have some fifty of that drawing in my book.

On 1. Mai 2015 15:42:27 MESZ, bruce miller notifications@github.com wrote:

Yes, 5855 warnings, in fact. However, it only takes slightly over 2 minutes to compile on my, fairly old, system.


Reply to this email directly or view it on GitHub: https://github.com/brucemiller/LaTeXML/issues/560#issuecomment-98137380

Diese Nachricht wurde von meinem Mobiltelefon mit Kaiten Mail gesendet.

dginev commented 9 years ago

only takes slightly over 2 minutes

Sadly, my standards are also too high in this regard - I would only use only for times under 30 seconds (if we are talking Tikz). For me to be able to offer Tikz graphics to Authorea users, we would need them to finish converting in 30 seconds, which is the upper bound of our requests. Daemonizing is one approach, but in this case it seems the actual core conversion runtime is also high.

I had already looked a little into this issue (and the other related tikz ones we have opened), but it's somewhat hard to properly understand what is happening under the hood, as the Tikz internals are quite complex. So it's very unlikely this issue will have a meaningful solution in the short term, but there is some hope for the mid-term.

brucemiller commented 9 years ago

"only" was only in comparison to "hours", not to "desirable"

dginev commented 9 years ago

Ooops, my bad, haven't had my coffee yet.

bernhard-kleine commented 9 years ago

This is also a reminder that there is two problems in acronym:

Conversion complete: 18686 warnings; 2 errors; 2 undefined macros[\acroextra, \AC@acl].

processing finished Fri May 1 17:06:56 2015

and the warnings are almost completely due to tikztree.

I am however, very thankfull, for latexml.

Kind regards

Bernhard

Von: Deyan Ginev [mailto:notifications@github.com] Gesendet: Freitag, 1. Mai 2015 16:32 An: brucemiller/LaTeXML Cc: Bernhard Kleine Betreff: Re: [LaTeXML] Deep recursion with Tikz (#560)

Ooops, my bad, haven't had my coffee yet.

— Reply to this email directly or view it on GitHub https://github.com/brucemiller/LaTeXML/issues/560#issuecomment-98145221 . https://github.com/notifications/beacon/AIyh53GYX_KEPWFHkqZnGs5P3WWeMa36ks5oE4X3gaJpZM4DAFJZ.gif

brucemiller commented 9 years ago

Well, yeah, the open issue is actually a reminder that acronym needs work. Funny thing, I was just working on it...

brucemiller commented 9 years ago

Did some more research on recursion: Default perl starts complaining when the same routine is over 100 deep, but you can disable it by strategically placing no warnings 'recursion';. I've done that, and the warnings are gone.

Didn't affect the speed much at all, however. I rewrote the profiler and got some interesting statistics. Maximum "call" depth (but that includes macros which don't recurse) is 529! \expandafter is called 308133 times, and has a total cost of 28 seconds exclusive of the tokens it expands. And so on. Unfortunately, none of that has yet led to any appreciable optimization.

I get the impression that rewriting code to avoid recursion (which would be very hard for LaTeXML) would end up replacing the recursive code by some other cruft which would also turn out to be slow! There are several call layers with each formal recursion level on "invoking a token", some rewriting might help that, but I'm getting sceptical.

bernhard-kleine commented 9 years ago

where did you put that "no warnings 'recursion'; ". The time it consumes is not so important, but I would like to see the other warnings which are gone as long the recursion fills the screen.

Bernhard

brucemiller commented 9 years ago

I put several of those declarations in key places in the Perl subroutines that will tend to be called in recursively in deep structures. Basically these were the routines that were being reported in the warnings (invokeToken, absorb, reversion...).

bernhard-kleine commented 9 years ago

Without much knowlegde of perl do you think I could do the same?

brucemiller commented 9 years ago

By trial and error, perhaps, but understanding the architecture of the program would be important, as well as Perl. Why do you ask? Are you seeing other deep recursion issues with LaTeXML?

bernhard-kleine commented 9 years ago

I would like to see the messages on the screen without the recursion warnings. I would be very glad if you would advice how that is possible.

brucemiller commented 9 years ago

I don't understand what you're asking. If you update LaTeXML, there shouldn't be any recursion warnings with the example you gave.

It could be that other examples would, however. If so, please a short tex sample, and the kinds of messages being produced.

dginev commented 9 years ago

529 is quite the call depth. But are the 28 seconds for expandafter really excluding the invocation time for its tokens? It is one of the emptiest macro definitions.

dginev commented 9 years ago

@brucemiller : could you give me an example file and the invocation you are using to profile the expandafter performance? I would love to dig into this a little bit myself and see if I can find some new angle of attack.

brucemiller commented 9 years ago

There could be errors, of course, although I sorted out a bunch. But the 529 is a logical depth of all calls, including macros, primitives and constructors. As if recursive macros were actually recursive. Normally TeX programmers don't even think of the depth of macro calls. And only the non-macro part of that count actually has any correspondence to Perl's call stack.

As far as expandafter is concerned, that initially surprised me to, but don't forget that it's being called 300K times! So it's like 100ms per call. Is that a lot?

The test case I used was berhard's from above. I've just committed latexml.sty with a new profiling option to turn it on.

dginev commented 9 years ago

I just realized your profiling support is only printed by bin/latexml, I was trying out bin/latexmlc at first. Why not add the Core::showProfile call to the end of the Core::digestFile or Core::convertDocument subroutines? Then it will be auto-used also for latexmlc.

dginev commented 9 years ago

So here is the list of most expensive exclusive macros the profiler returns:

  \expandafter:20.19s
  \def:9.34s
  \pgfmath@parse@@number:8.65s
  \@@input:6.50s
  \let:6.32s
  \pgfmath@parse@next:5.61s
  \edef:4.31s
  \pgfmath@parse@@ifbgroup:2.56s
  \pgfkeys@unpack:2.26s
  \pgfkeysifdefined:2.20s
  Begin:2.17s
  \pgfutil@ifnch:2.09s
  \futurelet:1.79s
  \RequirePackage:1.67s
  \pgfmath@bgroup@@@strip:1.62s
  \InputIfFileExists:1.42s
  \pgfkeys@splitter:1.35s
  \the:1.34s
  \@iinput:1.20s
  \pgfkeys@spdef:1.18s

and frequencies:

Most frequent:
\expandafter:277455
\let:111209
\def:60847
\futurelet:31172
\edef:17973
\pgfmath@parse@next:17448
\global:16806
\string:13880
\the:13640
\relax:10234
\pgfmath@token:9167
\pgfkeyscurrentkey:8222
\catcode:7579
\pgfmath@parse@@ifbgroup:7362
\pgfmath@parse@ifbgroup:7325
\pgfkeys@pathtoks:7209
\pgfutil@ifnch:6525
\pgfutil@reserved@c:6525
\pgfkeysifdefined:6429
\advance:6273
\pgfutil@ifnextchar:6226
\pgfkeys@temptoks:6168
\pgfkeys@splitter:5821
\pgfmath@next:5393
\pgfkeys@sp@b:5180
\pgfkeys@spdef:5180
\pgfkeys@sp@c:5180
\pgfkeys@sp@a:5180
\pgfkeys@parse:4394
\pgfkeys@parse@main:4394
\pgfmath@parse@@number:4297

It looks like some of the pgfmath@parse macros are great candidates for Perl bindings, if we're lucky. They are infrequent and very expensive.

At the same time, Tikz will not feel noticeably faster until we make the \expandafter and \def calls much faster. Even if it is reasonable to take 20 seconds, it's still too much time. A single Tikz figure should execute in 20 seconds overall, rather than just spend that time on \expandafter's. But safe of a C rewrite for certain components, it may be too hard to optimize as far as we need in pure Perl, safe for some very clever rearrangement of the core TeX processing. No easy answers here...

brucemiller commented 9 years ago

exactly. pgfmath looks like good candidate. I spent some time getting my pgfkeys stuff working again, that should help. Be aware that by binding things like keys & math, the number of calls to low-level TeX stuff will be much reduced.

bernhard-kleine commented 9 years ago

Dear all,

with the whole document (some 500 pages) I encounter still some 70 warnings:

Warning:perl:warn Perl warning at Anonymous String; line 0 col 0 Deep recursion on subroutine "LaTeXML::Core::Document::absorb" at (eval 92) line 4. In Core::Document[@0x6bc59f8] at Anonymous String; line 0 col 0

Otherwise the document compiles fine, but it takes 3 1/2 hrs. All the errors of acronym and threeparttable gone.

dginev commented 9 years ago

Just for comparison, how long does it take to compile the same document with pdflatex ?

bernhard-kleine commented 9 years ago

About five minutes maximally

On 18. Mai 2015 00:10:06 MESZ, Deyan Ginev notifications@github.com wrote:

Just for comparison, how long does it take to compile the same document with pdflatex ?


Reply to this email directly or view it on GitHub: https://github.com/brucemiller/LaTeXML/issues/560#issuecomment-102864274

Diese Nachricht wurde von meinem Mobiltelefon mit Kaiten Mail gesendet.

brucemiller commented 9 years ago

I assume Bernhard is referring to his whole book? Don't know which Deyan was asking about, but the testfile at the top takes only about 0.6 seconds in pdflatex.

As far as the remaining warnings for absorb, that's bothersome. ->absorb is called in quite a few places, and it seems that you can only disable the warning if you do it before the outermost call(??). There's got to be a better strategy...

dginev commented 9 years ago

Thanks, I am collecting various bits and pieces of information regarding LaTeXML's speed, and in particular the contrast with pdflatex, as I am trying to fish out high impact improvements we can make that get us in the pdflatex ballpark. My personal long-term goal is to achieve a 10x slower performance than pdflatex for Tikz with LaTeXML, which seems wildly difficult at the moment, but it's certainly not impossible.

bernhard-kleine commented 9 years ago

I was wrong , the whole book takes only 55 sec to compile.

brucemiller commented 9 years ago

I tried putting the no recursion warning at the highest level call to ->absorb, in the hopes that it catches all the various scattered calls. Please give this a try and let me know if that silences the recursion warnings.

BTW: tikz should be running slightly faster now, due to various other updates.

bernhard-kleine commented 9 years ago

This remains:

Warning:perl:warn Perl warning at Anonymous String; line 0 col 0 Deep recursion on subroutine "LaTeXML::Core::Document::absorb" at (eval 92) line 4. In Core::Document[@0x6958820] at Anonymous String; line 0 col 0 .................................................................................................

brucemiller commented 9 years ago

Just one such message, or many?

bernhard-kleine commented 9 years ago

too many, the log is not seen

brucemiller commented 9 years ago

I did a bit more probing, and figured out a common situation where Lists cause a lot of nesting; the recursion can be opened up in Document->absorb for those cases, which seems to eliminate the recursion warnings, for me.

Please git pull and rerun your document. I suspect the warnings will be gone now. If not, I'll need a small test case for the figure (I assume it's another, larger, tikz picture) that causes it. Thanks;