ftilmann / latexdiff

Compares two latex files and marks up significant differences between them. Releases on www.ctan.org and mirrors
GNU General Public License v3.0
514 stars 72 forks source link

Option to include a hook to call a user-supplied script for cleanup prior to --flatten #165

Open jasonmccsmith opened 5 years ago

jasonmccsmith commented 5 years ago

I have custom commands that cause fits with flatten, and I'd like to remove the commands myself prior to latexdiff attempting to flatten files.

I was doing this in a custom script above invoking latexdiff, but my organization has now (finally) converted to a git backend for document versioning, and we'd like to use latexdiff-vc. The cleanup has to occur between the git checkout, and the flattening.

The heavyhanded approach is to add the code directly to sub flatten in latexdiff, and it works well, but we'd prefer to not have to fiddle with updates to latexdiff. Instead, we realized that we can't be the only ones in this boat, and would it instead be more useful to all to have a new option offering a script to be run prior to flatten. Input text, output text.

Using a git-hook isn't a great option, because we would only need this cleanup for use with latexdiff, and the rest of our workflow relies on the offensive bits.

Thoughts? Will be editing sub flatten in situ in the meantime.

ftilmann commented 5 years ago

It does sound like it would be, in principle, a fairly easy thing to do. So the script should run (separately) on the old and new master file in modifies these files destructively, or works as a filter (i.e. taking stdin and outputting stdout)? Would it need options? If options are allowed I could see trouble with option arguments containing spaces but otherwise one could pass command and its options as a quoted string to latexdiff. Would you care to share a minimum example of your offending markup and/or your modifications to flatten or your pre-processing script (for testing). Just dependent on how 'custom' this is, an easier solution might be to include your modifications in the standard flatten. And an example would help in testing the new feature. I should warn that 'latexdiff' is a hobby, so implementation of above feature request, though OK in principle might take some time, possibly considerable time.

jasonmccsmith commented 5 years ago

iffileemptyelseMWE.txt

I was envisioning filter, no options. (If a site maintainer wanted to use this highly specific bailout, it would be on them to write the filter script as they need to. If they need different behavior, they pass a different script name to latexdiff.)

Minimum example: we have a highly structured document system for producing technical specifications, from authors who don't know LaTeX. Frankly, they're terrified of it. (I know, I know, Word refugees, what are you going to do.)

As part of making sure they deal with as little structure as possible, and instead concentrate on content, we allow for optional sections, etc, but we can't do:

\section{Optional Section} \input{myoptionalsection}

If they haven't provided myoptionalsection.tex, or it's empty, then we don't want the section included at all. So we have the following:

\iffileemptyelse{myoptionalsection} {} {\section{Optional Section}\input{myoptionalsection}}

I'm sure you can see where this goes wrong. flatten ignores the custom command and tries to handle the input, for a file that may not be there, leading to a fatal error, and if it's there and empty, ends up injecting a section we don't want.

The cleanup script executes these ifthenelse conditionals, and replaces them with the appropriate clause. flatten then handles the \input commands as expected.

The other customization is for local change tracking notes. They are critical for communication between authors, but should not appear in the final rendered document, or be considered for diffs. Assume the command that wraps these is \mycomment, is there an existing way of having latexdiff ignore their contents completely? I have suspected there is, and I just missed it.

Edit: I'm happy to help with implementation and generate a pull request.

jasonmccsmith commented 5 years ago

Added a new option:

latexdiff --filter-script="<path-to-script> [options]"

The filter script must be written to accept pipe input, and write to stdout. Within sub flatten only, the contents of $text within sub flatten will be replaced by the output from the filter script prior to the current existing flatten logic.

Question: move filtering just after assignment of $old and $new at line ~810, so it is handled before anything else? I only need it for flattening, but it occurs to me that that would make it a universal filter hook point.

ftilmann commented 5 years ago

Yes, I think it would make sense to offer this functionality independently of flatten. I don't see a pull request yet, but I presume that's intentional.

ftilmann commented 5 years ago

The other customization is for local change tracking notes. They are critical for communication between authors, but should not appear in the final rendered document, or be considered for diffs. Assume the command that wraps these is \mycomment, is there an existing way of having latexdiff ignore their contents completely? I have suspected there is, and I just missed it.

I never answered this. The contents should be ignored by default for all , but a new \mycomment would be wrapped by \DIFaddbegin and \DIFaddend commands. In the default style those commands don't do anything so I am wondering why you get problems with this

jasonmccsmith commented 5 years ago

Yes, pull was waiting on hook placement discussion. Will move, test, and submit pull request.

mycomment issue was that it needed to be stripped, filter script takes care of that. No further issue there, thanks.

benkinooby commented 4 years ago

I want to add my usecase to this issue, referring to the initial question of @ftilmann

Hopefully, my comment adds value to this (slightly dated) conversation.

I dynamically generate files which will be included in my tex file. Therefore, I run two commands in bash prior to invoking latex. With latexdiff-vc --flatten -r..., those generated files will be missing.

1) I have a graph.dot file [1] which holds the description of a graph I use dot2tex to generate a graph.tikz which I use as \input{graph.tikz} Ideally, I only commit graph.dot and gitignore the tikz file

2) I use latex-git-log to append a table with the git commit log to the end of my document, by using \input{} on the file generated by latex-git-log. Similar, to tikz, I'd like to avoid tracking the generated tex file.

Possible workarounds are:

In my case, the best solution I can think of is an option --script as alternative to --pdf or --ps. This may very well be a bad/naive idea. Any feedback/thoughts on this are welcome.

[1] example dot file with generated output at https://graphviz.gitlab.io/_pages/Gallery/directed/cluster.html

falko commented 11 months ago

@jasonmccsmith since your pull request #172 has been merged, can this issue be closed or was there a problem left that the PR didn't cover?