IACR / latex-submit

Web server to receive uploaded LaTeX and execute it in a docker container.
GNU Affero General Public License v3.0
11 stars 0 forks source link

We need better error reporting #37

Closed kmccurley closed 1 year ago

kmccurley commented 1 year ago

One of the miserable things about LaTeX is the awful state of logs. In the paper by Nigel, I saw:

Package pgf Warning: Returning node center instead of a point on node border. D
id you specify a point identical to the center of node ``n3''? on input line 16
. 

It's not at all obvious from reading the logs what file is referred to in the line 16, because you have to go back in the log to see a line

(./Pic.tex

that wasn't closed by another ), because this is how LaTeX logs show which files are open. Moreover, lines are wrapped at 79 columns. This makes it hard for humans to parse, and also difficult for a program to parse because the ) character can occur inside another warning message.

There are several reasons for wanting better logs

  1. we should provide a better user interface for authors and copy editors to read the logs
  2. we want to extract addressable issues from the logs so that authors can easily find them among the other crap in the logs, and copy editors can raise them as issues.
  3. we would like to make the log clickable to go directly to the place in the LaTeX source where the problem occurred. Note that this is different than what overleaf and texstudio give with synctex to go from a place in the PDF to a place in the LaTeX and vice versa. We want to go from the log to the source in the LaTeX.

I've discovered several approaches to this:

  1. if you use the option -file-line-error on pdflatex, then when it encounters a fatal error, it will print the current file and line number. This is convenient and easy, so we should do it. There is unfortunately nothing comparable for warnings.
  2. some warnings go through \PackageWarning and we can trap these using the new hook mechanism. The following accomplishes this by patching the \PackageWarning macro:
    \RequirePackage{currfile}
    \let\OrigPackageWarning\PackageWarning
    \def\PackageWarning#1#2{%
    \GenericInfo{}{iacrcc:Warning reported by IACR/latex#1 in \currfilepath}
    \OrigPackageWarning{#1}{#2}
    }

    This produces useful things about the current file where \PackageWarning was called. This turns

    Package amsfonts Warning: Obsolete command \bold; \mathbf should be used ins
    tead on input line 56.

    into

    
    iacrcc:Warning reported by amsfonts in Shamir.tex on input line 56.

Package amsfonts Warning: Obsolete command \bold; \mathbf should be used instea d on input line 56.


This will allow us to extract better (clickable) information from `\PackageError`.
3. It's more complicated to flag overfull hboxes, because they are created in the stomach and TeX doesn't have any information about files there. The messages are apparently produced with `\errmessage` and it's not possible to patch this with `\AddToHook`. As mentioned above, it's theoretically possible to go back through the log and figure out which file you're in, but it requires a parser for the log to recognize things like `(./section/introduction.tex` as a signal that it opened a file, and later keep track of closing `)` to indicate that a file was closed. In the intermediate it's possible that a log message could have a lone `)` in it and confuse the parser.

I've found several LaTeX log parsers, including:
* the one in overleaf (written in coffeescript, which [compiles to javascript](https://github.com/overleaf/latex-log-parser/blob/master/dist/latex-log-parser.js))
* [one written in python](https://github.com/guillaumesalagnac/latex-compile/blob/master/latex-compile) that is part of a latexmk replacement in python.
* [another one](https://github.com/tsgates/die/blob/master/bin/parse-latex-log.py) in python
* [a python package TexOutParse](https://pypi.org/project/texoutparse/). They are at least honest that logs are unstructured and hard to reliably parse, but they have tests. It doesn't seem to do the hard part of extracting the current filename.
* a [javascript parser](https://github.com/TeXworks/texworks/blob/857867414ec57bb814d3ca90f9ef1a865662155b/res/resfiles/scripts/Hooks/logParser.js#L77) that is part of TexWorks. It looks pretty good, and appears to recognize messages from `\errmessage` and captures the current file that LaTeX is operating on.

None of these appear to be usable "out of the box", but they contain the necessary information for us to extract good information from logs (at least with pdflatex).
kmccurley commented 1 year ago

I'm nearing completion on a latex log parser that is able to keep track of errors that appear in the logs. This addresses the issue that the latex log doesn't make it clear what file was the source of an error. It will require using pfdlatex -recorder (or lualatex -recorder).

kmccurley commented 1 year ago

I've now addressed this in the code, with a new LatexLogParser and new CompileError entries in the Compilation data structure.

class CompileError(BaseModel):                                                                                                                                                  
    error_type: ErrorType = Field(...,                                                                                                                                          
                                  title='type of error',                                                                                                                        
                                  description='This tells you which fields should be populated')                                                                                
    text: str = Field(...,                                                                                                                                                      
                      title = 'Description of problem',                                                                                                                         
                      description = 'Textual description of problem for author. A required field.')                                                                             
    logline: int = Field(...,                                                                                                                                                   
                         title='Line in log where it occurs',                                                                                                                   
                         description='This is the start of where the error occurs, but there may be lines after it.')                                                           
    package: str = Field(None,                                                                                                                                                  
                         title='LaTeX package name',                                                                                                                            
                         description='Populated when we find it. Not required')                                                                                                 
    pageno: int = Field(0,                                                                                                                                                      
                        title = 'Page number in PDF',                                                                                                                           
                        description = 'Page number where problem appears. May required.')                                                                                       
    pdf_line: int = Field(0,                                                                                                                                                    
                          title = 'Line number in PDF',                                                                                                                         
                          description = 'Line number in copyedit version. Not required.')                                                                                       
    filepath: str = Field(None,                                                                                                                                                 
                          title='path to LaTeX file',                                                                                                                           
                          description = 'Location of LaTeX error or warning. Not required.')                                                                                    
    filepath_line: int = Field(0,                                                                                                                                               
                               title = 'Line number location for LaTeX warning',                                                                                                
                               description = 'Line number in filepath where LaTeX warning occurred. Not required.')                                                             
    severity: float = Field(0,                                                                                                                                                  
                            title='Severity of overfull or underfull hbox or vbox.',                                                                                            
                            description='For overfull boxes, it is the size in pts. For underfull boxes it is badness.')                                                        
    help: str = Field(None,                                                                                                                                                     
                      title='Help for authors',                                                                                                                                 
                      description='The parser may have more to say about an error')

The web UI for authors now shows a much better list of errors and warnings, with the ability to click on things to see where the problem occurred in the log, the source files, and the PDF (sort of). Only a copy editor would indicate the PDF line number.

There are a few caveats:

  1. the logs don't always record a line number in the source where the problem occurred (hyperref warnings are notorious for this). We at least show which file we think the problem was found.
  2. if an error is reported inside a style file instead of the user code, then we don't provide a way to see where in the style file it is reported. I would rather add a link to the documentation for the style.
  3. If an overfull \hbox occurs inside a float, then the page recorded for the error may be wrong since the float may only be exported later than where it was discovered. I do that by showing "PDF page ≈3" as the clickable link.

I'm now satisfied that we've addressed the basic issue, though there is always opportunity to improve it in the future.