Remove pdfcrop dependency

StevenClontz commented 3 years ago

@sean-fitzpatrick has a theory on how we might remove the pdfcrop dependency we're having trouble with in Windows (see https://github.com/PreTeXtBook/pretext-cli/issues/48 ):

Does anything break if we don't use the article document class? What if the .tex file began with \documentclass[tikz,crop]{standalone}?

sean-fitzpatrick commented 3 years ago

One thing could break: <latex-image>s that are not tikz. But we load tikz if it's a tikzimage anyway, so I think we can just use \documentclass[crop]{standalone}

sean-fitzpatrick commented 3 years ago

Looks like I need to edit the following. In extract-latex-image.xsl, lines 109-142:

<exsl:document href="{$filebase}.tex" method="text">
        <xsl:text>\documentclass[</xsl:text>
        <xsl:value-of select="$font-size" />
        <xsl:text>]{</xsl:text>
        <xsl:value-of select="$document-class-prefix" />
        <xsl:text>article}&#xa;</xsl:text>
        <xsl:text>\usepackage{geometry}&#xa;</xsl:text>
        <!-- ######################################### -->
        <!-- Determine height of text block, assumes US letterpaper (11in height) -->
        <!-- Could react to document type, paper, margin specs                    -->
        <xsl:variable name="text-height">
            <xsl:text>9.0in</xsl:text>
        </xsl:variable>
        <!-- Bringhurst: 30x => 66 chars, so 34x => 75 chars -->
        <xsl:variable name="text-width">
            <xsl:value-of select="34 * substring-before($font-size, 'pt')" />
            <xsl:text>pt</xsl:text>
        </xsl:variable>
        <!-- (These are actual TeX comments in the main document's LaTeX output) -->
        <!-- Text height identically 9 inches, text width varies on point size   -->
        <!-- See Bringhurst 2.1.1 on measure for recommendations                 -->
        <!-- 75 characters per line (count spaces, punctuation) is target        -->
        <!-- which is the upper limit of Bringhurst's recommendations            -->
        <xsl:text>\geometry{letterpaper,total={</xsl:text>
        <xsl:value-of select="$text-width" />
        <xsl:text>,</xsl:text>
        <xsl:value-of select="$text-height" />
        <xsl:text>}}&#xa;</xsl:text>
        <xsl:text>%% Custom Page Layout Adjustments (use latex.geometry)&#xa;</xsl:text>
        <xsl:if test="$latex.geometry != ''">
            <xsl:text>\geometry{</xsl:text>
            <xsl:value-of select="$latex.geometry" />
            <xsl:text>}&#xa;</xsl:text>
        </xsl:if>

I need to remove the lines that set the text height and width, and I think the lines that load the geometry package can be killed too. I think the whole thing can be reduced to

<exsl:document href="{$filebase}.tex" method="text">
        <xsl:text>\documentclass[</xsl:text>
        <xsl:value-of select="$font-size" />
        <xsl:text>crop]{</xsl:text>
        <xsl:value-of select="$document-class-prefix" />
        <xsl:text>standalone}&#xa;</xsl:text>

StevenClontz commented 3 years ago

Is supporting non-tikz images something that's really necessary? Getting Windows support might be a worthwhile trade for deprecating latex-images that aren't tikz.

Steven Clontz https://clontz.org - steven.clontz@gmail.com

On Fri, Jul 3, 2020 at 11:47 AM Sean Fitzpatrick notifications@github.com wrote:

Looks like I need to edit the following. In extract-latex-image.xsl, lines 109-142:
\documentclass[ ]{ article} \usepackage{geometry} 9.0in pt \geometry{letterpaper,total={ , }} %% Custom Page Layout Adjustments (use latex.geometry) \geometry{ } I need to remove the lines that set the text height and width, and I think the lines that load the geometry package can be killed too. I think the whole thing can be reduced to \documentclass[ crop]{ standalone} — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or unsubscribe .

sean-fitzpatrick commented 3 years ago

@rbeezer I am proposing that lines 115-142 can all be killed if document class is standalone, since this class doesn't really take a page size attribute. Is this overzealous?

In pretext/pretext, there are two options:

Leave it as is.
Remove the part that calls the pdfcrop utility.

I think cropping an already cropped image should do nothing. So leaving it in would:

do nothing on Linux/Mac because the image is already cropped
do nothing on Windows because pdfcrop does nothing on Windows regardless of whether there is anything to crop

But then we are slowing down the script to run an executable that does nothing. So if I'm testing I want to see what happens if I remove it.

mitchkeller commented 3 years ago

Is supporting non-tikz images something that's really necessary? Getting Windows support might be a worthwhile trade for deprecating latex-images that aren't tikz. Steven Clontz https://clontz.org - steven.clontz@gmail.com …

I know of at least one project that uses latex-image for something that isn't tikz.

sean-fitzpatrick commented 3 years ago

I think some people still use pstricks and some other things. Somewhere in my stuff I've taken advantage of the fact that you can actually put whatever the hell you want inside of <latex-image>, as long as it can be interpreted by XeLaTeX. I have a few commutative diagrams, for example. (This is not a big departure from tikz, as they're done in tikz-cd but it still isn't tikz.)

Alex-Jordan commented 3 years ago

Is supporting non-tikz images something that's really necessary?

Yes. We support any latex-based image-making tool. There are some pstricks examples in the showcase article. For example. Some people like xypic, etc. And I've even used it on rare occasion to make an image out of some big hairy display math thing that needed elements that MathJax can't handle.

sean-fitzpatrick commented 3 years ago

Pretty sure the following are identical in a LaTeX document: \documentclass[tikz,crop]{standalone} and

\documentclass[crop]{standalone}
\usepackage{tikz}

(i.e. all that the tikz option does is load tikz so that you don't have to.)

sean-fitzpatrick commented 3 years ago

I'll test to confirm. But certainly we can use standalone, with cropping, for things other than TikZ.

sean-fitzpatrick commented 3 years ago

Confirmed: you do not need to include tikz as an option. Output is identical if you load tikz separately (or even if you load another package that requires it, like pgfplots.)

sean-fitzpatrick commented 3 years ago

I just tried one of the pstricks examples from the showcase article in a standalone document and it works just fine. tikztest.pdf

sean-fitzpatrick commented 3 years ago

(I was going to drop the SVG but GitHub doesn't support SVG, apparently.)

rbeezer commented 3 years ago

We once had standalone and removed it for cause.

See @mitchkeller and @davidfarmer at #328

StevenClontz commented 3 years ago

Would a pure Python solution like https://pypi.org/project/pyPdf/ work then? (There's a pyPDF2 so further research might be best for finding the most stable library.)

sean-fitzpatrick commented 3 years ago

OK -- that's enough reason for me to pause on changing anything right now, and get back to the work I was supposed to be doing this morning :-) Though @davidfarmer doesn't say why he had to give it up, and @daverosoff mentions all the trouble with pdfcrop on Windows that we're now trying to avoid! I couldn't find anything useful online about using standalone with overpic but didn't look that hard. One thing that would no longer be supported: right now in the extraction template we support font sizes other than 10, 11, 12pt by prepending ext to article. There is no such option for standalone.

rbeezer commented 3 years ago

Would a pure Python solution like https://pypi.org/project/pyPdf/ work then? (There's a pyPDF2 so further research might be best for finding the most stable library.)

Yes, a Python solution would be preferable to an "executable" solution.

In this case, the last release is December 2010, which would give me pause.

StevenClontz commented 3 years ago

Yeah I lobbed that grenade right before breaking for lunch. PyPdf isn't maintained and a decade old so that's a no-go.

https://pypi.org/project/PyPDF2/#history seems to support Python3 and has commits in 2018 (but no later), with the latest PyPI release in 2016. https://www.blog.pythonlibrary.org/2018/06/07/an-intro-to-pypdf2/ https://realpython.com/creating-modifying-pdf/#cropping-pages https://github.com/mstamy2/PyPDF2

There's a PyPDF4 with recent commits (past couple weeks) as well: https://github.com/claird/PyPDF4 But the PyPI verison is from 2018 too. https://pypi.org/project/PyPDF4/#history

None of these seem to have a standard license (which would affect packaging and distributing a Windows binary someday perhaps, but wouldn't hurt specifying them as dependencies that users install themselves via pip).

sean-fitzpatrick commented 3 years ago

For what it's worth, I ran my experiment anyway:

remove geometry and page size details from extract-latex-image.xsl and change document class to standalone with crop option (this cut out like 60% of the code in that file!)
remove lines referencing pdfcrop from pretext.py Everything looks about right to me. (I didn't think anyone would be interested in browsing a folder with 768 images via http so I haven't plunked them down anyway.)

StevenClontz commented 3 years ago

(If we can't get a native Python library then...) If LaTeX can handle a majority of use-cases by itself with the standalone class, there's an argument that using pdfcrop should just be an option for the edge cases that need it (e.g. if I implemented in pretext-cli, I'd have something like @click.option("--latex-image-crop",click.Choice(['standalone','pdfcrop']),default='standalone').)

davidfarmer commented 3 years ago

I guess I am late to this party!

Some packages do not work well with standalone. Can't recall which. I ended up using documentclass article, and pagestyle empty, which seemed to serve the same purpose.

I can dig through my code if you are having trouble making it work.

I recall using standalone for a while, until I couldn't. Maybe it has been fixed in the past few years?

StevenClontz commented 3 years ago

Worth noting this unanswered SO question asking how to do what we'd need PyPDF2 to do: https://stackoverflow.com/questions/53505763/crop-a-pdf-page-to-content

So maybe the task isn't trivial, but if we solve it then someone can get some internet points.

Alex-Jordan commented 3 years ago

Is this helpful? (Second answer.)

https://stackoverflow.com/questions/457207/cropping-pages-of-a-pdf-file#answer-51626910

StevenClontz commented 3 years ago

Okay, it appears that PyPDF2 alone isn't enough. But I found this:

https://pypi.org/project/pdfCropMargins/

It's not pure Python, but its Windows package apparently comes packaged with the binaries it needs, so maybe it works.

It also doesn't seem to have a useful API to be called from another script, but can be called from the command line:

python -m pip install pdfCropMargins --user
pdf-crop-margins input.pdf -o output.pdf -p 0

I'm going to try this in Windows now.

StevenClontz commented 3 years ago

It leaves some warnings but it worked:

C:\Users\sclontz\Downloads>pdf-crop-margins document-1.pdf -o output.pdf -p 0

Warning in pdfCropMargins: The wildcards in the path
   output.pdf
failed to expand.  Treating as literal.

Warning from pdfCropMargins: No system pdftoppm was found.
Reverting to an older, locally-packaged executable.  To silence
this warning use the '--pdftoppmLocal' (or '-pdl') flag.

rbeezer commented 3 years ago

Good news! Thanks for keeping on this one.

StevenClontz commented 3 years ago

Note that per https://github.com/abarker/pdfCropMargins/issues/20 we may eventually be able to call the library directly rather than passing to the command line invocation.

rbeezer commented 3 years ago

Resolved at #1329

PreTeXtBook / pretext

Remove pdfcrop dependency #1327