IACR / latex-submit

Web server to receive uploaded LaTeX and execute it in a docker container.
GNU Affero General Public License v3.0
11 stars 0 forks source link

What tools to provide for copy editors to view diffs #30

Closed kmccurley closed 11 months ago

kmccurley commented 1 year ago

This is a tracking bug for the alternatives in comparing two PDFs after copy editing. There was previously a google doc with some information.

The basic problem is how to build a UI that compares two versions of a paper. We can either base it on the diffs in the LaTeX or the diffs in the PDF.

Showing diffs in the LaTeX is probably easiest, since we can use various diff tools (and diff -r to compare two trees). Unfortunately this requires the copy editor to understand LaTeX, but we could provide it as an alternate view. There are libraries like difflib that allow showing diffs. See dompare for an example of something built this way.

Showing diffs in PDFs can be done in various ways. None of them are very satisfying.

pdfpagediff

There is an interesting LaTeX package called pdfpagediff that allows you to overlay two PDFs as layers in a single PDF. In some PDF viewers you can selectively show one or the other layer. Notably this works in Firefox, but apparently not in chrome. An example is attached here: compare.pdf. I found this difficult to use, since it puts a lot of burden on the viewer to see the changes. If you insert a sentence it can create quite a mess.

latexdiff

latexdiff is a perl script that creates a single PDF showing diffs. It has several drawbacks, not the least of which is that it will not follow nested \input. We could try to create our own tool to join the files into a single LaTeX file, but the task is nontrivial to obey whitespace considerations when \input is not on a line by itself. Some people have already written such tools latexpand. They would have to follow things like

\if\llncs
\else
\input{fullproof}
\fi

pdfdiff

This is a python script to produce a side-by-side comparison of two PDF files. The output is a png file that looks like this: diffs This example shows that there are false positives (e.g, my name in the footnote).

difflib

The difflib library has various things that might be useful. The obvious difflib.HtmlDiff produces a really ugly html file. It can only be used to compare two files, so we have the usual problem of folding all files into a single file (or showing diffs on all uploaded files, assuming that the author used the same file structure with original and final versions).

side-by-side visual inspection

Another possibility is to provide a simple view of the two PDFs, and trust the copy editor to tell if the requested changes were made. This would not help them identify any other changes that were made.

kmccurley commented 1 year ago

An example of how pdfpagediff looks: image In a big document it can be hard to spot these differences, and then you have to selectively turn the layers on/off to see what changed.

kmccurley commented 1 year ago

Another possibility is to use a github repository. First submit the "candidate" version and commit it; then create a pull request with the "final" version so the copy editor can view the diffs. This would only work on the LaTeX though, and I think difflib.HtmlDiff.make_table is good enough. We may show views on both PDF and LaTeX files.

kmccurley commented 11 months ago

This has been solved using the ability to compare pre-copyedit PDF to post-copyedit PDF, along with the list of issues sent to the author along with their response. image The "Diffs" tab also shows a list of all latex files that have changed. If the author performs extensive changes during this phase then it will be hard to parse, but minor changes will be apparent to the copy editor.