OpenLogicProject / OpenLogic

An open-source, customizable intermediate logic textbook
http://openlogicproject.org/
Creative Commons Attribution 4.0 International
1.03k stars 235 forks source link

PDF generation for reviewing changes in pull requests #148

Open emanuelbuholzer opened 7 years ago

emanuelbuholzer commented 7 years ago

I started with reading OpenLogic a few days ago, and I am delighted to see a project like this. Thus, I want to contribute while I read the book and make notes for learning the written material. Watching open pull requests for content, that I already read or will read soon, is what I have done. While reviewing, I found it hard, especially for larger texts, to compare them using a diff utility. A beneficial feature would be, that for example, Travis CI automatically generates a PDF document, with the data from the pull request, while doing checks, for reviewing the changes more clearly.

rzach commented 7 years ago

That is a superb idea. I've been thinking about figuring out how to use latexdiff with git. I suspect it might not work with such a complex text though.

emanuelbuholzer commented 7 years ago

If you want I can investigate whether it would be possible to do this with Travis CI. If this wouldn't be possible, I can write a bot that does that for all pull requests.

rzach commented 7 years ago

Sure that would be great. Right now Travis just builds the complete PDF to make sure the changes actually compile. It should be easy enough to upload the resulting PDF somewhere. So it should be easy to autogenerate PDFs of whatever the pull request is. But to generate a marked-up PDF that shows changes may be tricky. Travis probably exposes the git commits in an env variable so one could run a diff b/w the pull request and HEAD before compiling. I think the first problem is getting latexdiff to work with complex files that include each other. I know it does have options for RCSs but I haven't tried it.

A place to upload would be good to. I've been meaning to get a separate server for it but if there's a simple and free solution you know of that would be even better.

emanuelbuholzer commented 7 years ago

After a quick search, I found Git LaTeXdiff, which looks promising. It's a wrapper around git and latexdiff which can call latexpand when LaTeX sources are split across several *.tex files. I'll try it tonight and will inform how it worked.

I don't know any good, reliable, straightforward and free to use solution for a server. But I'd recommend using a cloud infrastructure provider like DigitalOcean. I can probably get some credit for free, to play around with it ($20 USD). After they're used up, I'd be willing to participate in investing the money in the server. If you want, I'll do a documentation/concept for the whole thing.

rzach commented 7 years ago

latexpand doesn't deal with subfiles which OLP uses to include things (while letting them compile in a stand-alone way as well). To make matters worse, it's not called directly but by \olimport. AFAICT latexpand blindly replaces \includes by the content of the file.

A more promising strategy might be to simply run latexdiff on every *.tex that's changed, copy the resulting tex file containing the diff markup to the original and then compile open-logic-debug.tex, letting \olimport do the work.

rzach commented 7 years ago

DigitalOcean is probably overkill since we don't have to run anything -- only LaTeX which Travis CI will do anyway. But I'll have to move the web server anyway and it is probably enough to SCP the PDFs there. Then we just need a Zapier or something to automatically add a comment to the pull request pointing to the PDF.

emanuelbuholzer commented 7 years ago

Your strategy sounds solid. I'll give it a try tomorrow. Yes with only LaTeX on Travis CI it is, I thought you'd need not only a simple web server.

rzach commented 7 years ago

Right now the PDFs live on a university server but they're turning it off and even now I have to first fire up a VPN to scp stuff over, very annoying. Plus I want to put a proof checker and a turing machine simulator on the site (just PHP/JS).

emanuelbuholzer commented 7 years ago

I guess the website is thing PHP there, which is already running using PHP on the Taylor Institute for Teaching and Learning Web Application Server? An alternative would be to use GitHub pages, but they don't support PHP as far as I know. Apparently, I haven't worked with PHP much, other than setting it up in a Docker Container or a Virtual Machine.

emanuelbuholzer commented 7 years ago

Got the diff working on my local machine with subfiles doing it handy for every changed file, with removing the generated preamble that latexdiff creates in every diff-file and add it to the main file. I'll write a shell script so that it is possible to let it run on Travis-CI.

What do you think about using Docker in Builds on Travis-CI for also having a CI-System for everyone else working on the project? I'll do it anyway for this change, to speed up my workflow.

rzach commented 7 years ago

I looked into it; seemed like a good solution also to the problem of Travis downloading and installing a full TeXLive 2016 every time. Didn't pursue it just for lack of time. It'd be great if you'd figure this out (and maybe write down what one has to do to get a container to run LaTeX in in case someone else wants to do that too or we have to do it again at some point).

rzach commented 7 years ago

It also occurred to e that it only makes sense to produce a diff if the actual text is changed, and not if, say, we change a style file. More or less, this means make the diff only if a file in content/has changed.

I can also easily enough make it so that every file name has a reference pointing to it so that the script could write a file with all the names of changed files which open-logic-debug.tex could import if it exists -- then we'd have a list of changed sections with links to those so you don't have to go through hundreds of pages to find the things that changed.

Also: what happens if an entire file is added or removed?

emanuelbuholzer commented 7 years ago

I have created now an initial version of a Dockerfile, which is used to create such an image. I'll document what it takes to get a container running specifically for this project, with the minimum amount of dependencies and one with a more complete setup for personal use and writing etc.

That's what I supposed so. I think it is a good starting point to track only actual content changes and review this afterwards, to see if more work needs to be done in order to get an "advanced diff".

Having a list of the files which changed would be very helpful. I'd appreciate if you could show me how you can do this.

I guess without doing anything the removal and the addition of a file won't be tracked. I'll investigate this during development time. It probably would make sense to make source this out into a repository on its own, the LaTeX-diff for subfiles, as then other people could also use it. What do you think about that?

I guess splitting these tasks into two makes sense so that after testing the Continuous Integration process with Docker and finishing the documentation for it, I'll create a pull request to already merge it without the LaTeX-diff part.

rzach commented 7 years ago

Agreed: https://github.com/OpenLogicProject/OpenLogic/issues/149 for Docker

$ git diff master --name-only |grep ^content

gives you a list of all chnaged files in the current branch relative to master. Piping this through a sed script into a file, say, diff.tex will give you something that open-logic-debug.tex can then \input eg:

$ git diff master --name-only |grep ^content |sed 's/^/\\url{/g;s/$/}/g' >diff.tex