GjjvdBurg / paper2remarkable

Fetch an academic paper or web article and send it to the reMarkable tablet with a single command
MIT License
337 stars 27 forks source link

Exported annotations on cropped PDFs don't align #77

Open reini1305 opened 3 years ago

reini1305 commented 3 years ago

If I send a PDF to remarkable with cropping enabled (e.g. with --right or default parameters), the resulting annotation is at the wrong position when exporting the PDF. In this example, you can see the view in the remarkable app on the left, and the corresponding PDF export on the right: Screenshot 2020-11-20 at 11 11 48 With --no-crop everything is fine: Screenshot 2020-11-20 at 11 18 24

Remarkable is at 2.4.1.30, the Mac desktop app is 2.3.1

GjjvdBurg commented 3 years ago

What are you using for exporting annotations? If it's rMapi the bug might be in their code. When p2r crops the pdf the dimension of the pages will be changed, so the calculation of the annotation position must take that into account. This isn't something that can be fixed in p2r unfortunately.

reini1305 commented 3 years ago

I'm using the remarkable app itself on Mac to do the export. I'm afraid that the bug is in their export code...

GjjvdBurg commented 3 years ago

Yeah, that's quite likely. What p2r does is essentially a "hard crop" since the page dimensions are changed. The reMarkable itself I think only adjusts the view when you use the crop function, leaving the page size unchanged.

There are a number of tools to export annotations, including rMapi, remarks, and rM2svg (see this page for an overview). I don't know if any support PDFs with different page sizes, but it might be worth a look.

I'll close this for now since the issue isn't in p2r.

reini1305 commented 3 years ago

Thanks. I think it would be good to note this somewhere or possibly change the default option to avoid user frustration :)

GjjvdBurg commented 3 years ago

Done!

gwtaylor commented 3 years ago

I'm wondering if anyone has found a workflow to deal with this issue yet, or found success with any of the tools that @GjjvdBurg suggested?

I am using the (convenient) reMarkable app functionality to export to PDF. I'm experiencing the same behaviour as described: annotations don't align to the cropped PDF.

Yes, I could disable cropping, but it is a nice-to-have! So if anyone discovered a workflow that supports both cropping and PDF export, that would be great to know.

I'm a rMapi user and attempted its gita command, but that has the same behaviour as the rM app on the cropped PDFs.

GjjvdBurg commented 3 years ago

Hi @gwtaylor,

I'm not aware of any work-arounds, but I do have an idea of how we can fix this. As far as I recall the issue is that we set the bounding box of the PDF, which is sort of a "hard" crop, and that results in the annotations not being aligned. However, the reMarkable supports adjusting the view as well (a "soft" crop if you wil).

So, the idea is to add an option to p2r to use a soft crop if the user prefers. I just added my work-in-progress code to the repo in #95. If you're feeling up for it, you could consider contributing to the project by working on that PR (I'm happy to advise, but haven't found the time to dig into this myself lately).

I'll reopen this issue as I think it can be solved using the soft crop. As a sanity check: have you tried using the reMarkable's crop functionality and seeing how that affects the positions of the annotations? If that doesn't actually work then my PR won't help either of course.

gwtaylor commented 3 years ago

Hi @GjjvdBurg, thanks for your reply and for drafting #95. I found a workaround, though I think your proposed solution is more elegant than mine.

To answer your question about the reMarkable's crop functionality, I did attempt to use "Reset View" in the UI, after the file had been hard cropped and annotated, but that didn't make a difference. I think that was because after the hard cropped, the view was by default "zoomed out" as far as it would go. But perhaps you are asking whether I took a file that was not hard cropped, cropped using the reMarkable "Adjust View" tool (i.e. soft crop), annotated and exported as PDF? Yes, that certainly works.

Here is my workaround:

  1. Using the reMarkable software, "Export to SVG". This will create a folder of SVG files, one per page. Interestingly, these don't suffer from the misalignment issue.
  2. Convert each SVG file to a PDF using rsvg-convert
  3. Assemble the PDFs into a single PDF using pdftk

Both of these command-line tools are available through Homebrew via the packages librsvg and pdftk-java, respectively.

Here is a bash script that automates steps 2 and 3, after you have exported the SVGs using the reMarkable software:

#!/bin/bash

# Convert a folder of SVG files exported by reMarkable software to a single PDF
# ARGUMENTS:
#   A directory containing SVGs, 1 file per page
# It is expected that this function be run from the parent of the
# directory of SVGs
# This function will create a PDF with the same name as the directory and a
# .pdf extension
# EXAMPLE
#   p2r2pdf Liao_et_al_-_Efficient_Graph_Generation_With_Graph_Recurrent_Attention_Networks_2019
#
#   This results in Liao_et_al_-_Efficient_Graph_Generation_With_Graph_Recurrent_Attention_Networks_2019.pdf
p2r2pdf () {
  echo "Processing $1"
  folder=$1
  # read number of svgs into variable
  # xargs trims whitespace
  nsvg=$(ls $folder  | wc -l | xargs)
  echo "$nsvg files found"
  prefix="$folder - page "
  filepath="$folder/$prefix"
  for i in {1..$nsvg}
  do
      filename=$filepath$i.svg
      echo "Converting $filename"
      rsvg-convert -f pdf -o "${filename%%svg}pdf" "$filename" 
  done
  finalpdf="${folder}.pdf"
  echo "Building final PDF: $finalpdf"
  pdftk $filepath{1..$nsvg}.pdf cat output "$finalpdf"
  echo "Removing intermediate PDFs"
  rm $filepath{1..$nsvg}.pdf
}

So far it looks good, except for one tiny issue. When it exports SVGs, the reMarkable software embeds the original PDF as an image and the annotations as strokes. Therefore the resulting PDF will have crisp (vectorized) annotations but the background is blurry (rasterized).

GjjvdBurg commented 3 years ago

Thanks for sharing your approach @gwtaylor! This seems like a good intermediate solution to have around while #95 is unfinished, and I'm sure it'll help people who are dealing with the same issue.

FWIW, I did indeed mean a soft-crop + export, so good to know that that does work. Thanks!

sternj commented 1 year ago

I'm interested in what's happening here-- How does the remarkable calculate where annotations should be when exported compared to where they should be when being drawn on the page?

sternj commented 1 year ago

Also worth noting that this issue also occurs with the geta subcommand of rmapi, which can be interrogated

sternj commented 1 year ago

For reference, this is how the PDF library thinks about annotations. My largely uninformed theory is that unipdf is deciding the page height based off of some metadata field that isn't changed in here. I'm going to do some more hunting, it seems like the dimensions of each PDF page is decided individually, specifically (it seems) by the mediabox. I think it's done here, since rmapi appears to put in the original PDF pages as "background" pages. I'm not entirely sure how this all connects yet, but my suspicion is that the recomputation being done might be causing the issues.

sternj commented 1 year ago

So I've established that the MediaBox attribute does have an effect on alignment, but it doesn't fix the misalignment. I have also confirmed that it is the CropBox that causes the misalignment. I think that it might be a translation error between ghostscript and PIL format, but I'm not entirely sure yet.

sternj commented 1 year ago

The issue isn't in the cropping itself but how the bounding box is being computed-- removing all reference to the margins in get_bbox surfaces this issue. I'm also seeing that the translations are uneven-- boxes towards the top of the page are translated upwards more than boxes on the bottom of the page. I think it's something in get_bbox-- @GjjvdBurg do you remember what the intuition behind that was?

sternj commented 1 year ago

Reassessing my previous assessment, the salient code happening in rmapi is the end of this function. I'm concerned by the reference to the transformation matrix, since it means that the math here might be more complex than I was thinking. However, it seems like the annotations are being translated down and to the right, rather than being uneven. I'm trying to figure out how annotations are drawn on the PDF, because it seems like they're being translated opposite the cropping

sternj commented 1 year ago

Confirming that the displacement of the annotations is directly related to how the box was created, I removed the upper and lower cropping and the displacement was only horizontal. What on earth are things drawn on top of?

sternj commented 1 year ago

Just like a normal person, I have found and started reading the pdf standard. What rmapi does is it uses a path, which is defined in section 8.5.2. The path is created relative to the coordinates in "user space". What worries me is that the objects in the PDF itself might be defined in terms of userspace, which isn't something p2r manipulates from what I can tell. That said, if that's the case, I can't tell for the life of me how any of this works-- if coordinates are absolute or quasi-absolute, why does cropping work at all?

sternj commented 1 year ago

So, to summarize:

Idea: perhaps the ReMarkable itself is defining coordinates based on its own screen, in which case what would need to happen is that rmapi would just need a minor edit to make the MediaBox (or whatever box) start at its (0,0)

Thank you for letting your email inboxes be filled by my comments! I think I know where to look now. More info to come. 

sternj commented 1 year ago

Well, I tried messing around with this in a few different ways only to conclude that there's something else going on other than just mismatches in the coordinate system. I reached out to Remarkable Support who informed me that the tablet expects something that's shaped like A4, which means there's some additional scaling happening (which is also reflected in the rmapi code). They suggested using "print to pdf" from Chrome, which doesn't solve the issue at hand, but the equivalent from Firefox does work, so... progress I guess? I spent a few hours trying to figure out what those web browsers are actually doing when they "print to PDF" but so far what I've found is a rather inscrutable event loop.

What's weird to me (and sufficiently weird to remarkable support that they've escalated it to the devs) is

I've made some good progress reverse engineering the PDF renderer from Xochitl itself, so I'm also digging into that and seeing what transformations it does to actually display the PDF.

I also reached out to the person who originally wrote the annotation code in rmapi, but he seems to be quite busy in his real life and his understanding largely relies on the tacit assumption that Xochitl makes that a PDF is shaped like an A4.

Once I'm out of this current push at work I'm going to try to figure out in a more systematic way what makes Firefox-rendered PDFs different from Chrome-rendered PDFs or p2r-rendered PDFs.

Long story short, this... is really weird, even by PDF standards. More to come as I read more \MediaBox entries and specify the proper arity of various QT functions.

sternj commented 1 year ago

An update-- this is something that I've actually brought to ReMarkable support and the developers apparently have a few theories as to the root cause that they're in the process of testing. Basically, I don't think the issue is actually with this repo at all, it's either in Xochitl or in both Xochitl and in rmapi. It's worth keeping this issue open for tracking purposes, but the root problem comes from and will be resolved elsewhere.

sternj commented 1 year ago

According to ReMarkable support, they're targeting 3.6 for this fix

GjjvdBurg commented 1 year ago

That's great to hear, thanks so much for all your efforts on this @sternj!