bordaigorl / remy

Remy, an online&offline manager for the reMarkable tablet
GNU General Public License v3.0
282 stars 13 forks source link

[draft] try to tweak the pdfmerge to support annotations #19

Open benneti opened 3 years ago

benneti commented 3 years ago

I tried a bit to play with the pdfmerge function and I think it works in principle now, i.e. the annotations are at the correct place when transforming annot and not the base. But there are still some problems, it seems PyPDF2 breaks links inside the same pdf, and I am not sure how to fix this. Is there a particular reason why you use PyPDF2 instead of https://github.com/pmaupin/pdfrw ?

On another hand would it not make sense to bundle the effort with at least some of the other projects for the renderer https://github.com/lucasrla/remarks and https://gitlab.com/wrobell/remt ?

bordaigorl commented 3 years ago

Hi! Thanks for looking into this. I did play with annotations a bit at some point, here are my findings.

First, it's not too difficult to keep the approach of transforming the base if you also manually transform the annotation:

def transformAnnot(bp, rot, ratio, tx, ty):
  if '/Annots' in bp:
    for a in bp['/Annots']:
      annot = a.getObject()
      r = RectangleObject(annot['/Rect'])
      (x0,y0) = r.upperLeft
      (x1,y1) = r.lowerRight
      if rot == 90:
        x0,y0=y0,x0
        x1,y1=y1,x1
      annot.update({NameObject('/Rect'): RectangleObject([x0*ratio+tx,y0*ratio+ty,x1*ratio+tx,y1*ratio+ty])})

Second, although the above places the annotations in the right place, the links are broken. The reason for this is that the destinations of links are references to objects in the pages dictionary; when you do any page manipulation, these references are broken: the old objects get replaced by new ones but the references to the old ones do not get updated. I did try to quickly put something together to correct it but I was unsuccessful.

Is there a particular reason why you use PyPDF2 instead of https://github.com/pmaupin/pdfrw ?

I do not remember why I picked PyPDF2 in the end...(there was some reason I cannot recall, but I remember it was relatively minor). I think RCU uses pdfrw but had to modify it to get annotations to work properly, so it's not as easy as just porting the code.

On another hand would it not make sense to bundle the effort with at least some of the other projects for the renderer

In principle: yes. In practice: different projects use different libraries (e.g. Qt vs Cairo vs manual SVG generation) with different tradeoffs for the rendering. I think the main problem is that there is no official specification for the notebook format and its intended rendering, so every implementation has its own (usually incomplete) interpretation. Many things are actually quite tricky (how to render pencil in a vector format for example, or handling the eraser) and decisions on some are subjective (e.g. what function to use to determine thickness of lines?). Each aim at supporting some extension, put together as a partial workaround (e.g. colors encoded in layer names). There seems to be no silver bullet. In short: I don't think this is going to happen soon. The sturdy solution would be for reMarkable to release an official renderer.

My choice of using Qt was guided by the fact that I was already writing a Qt gui, and that I wanted to have on the fly previews. Another side goal was to be able to experiment with the notebook format, testing line simplification and other ides, so QGraphicsView was a good choice in view of maybe writing editing features (for post processing or even for writing back to the rm). In my plan there was a generalisation of the current renderer where you could very flexibly determine how to render lines from a configuration file. That never materialised because I got the basic functionality to a stage that works for me and run out of time. (the project is definitely not dead, but I am adding functionality very slowly and as needed)

benneti commented 3 years ago

I think getting internal links in the pdfs working will be a hard problem, because of https://github.com/mstamy2/PyPDF2/issues/370

bordaigorl commented 3 years ago

Yes that's very disappointing. I think the only real way to do this is to dive in the PDF reference and work with the relevant object dictionaries directly.

benneti commented 3 years ago

I think the object dict would need to be transformed like this (I am not sure about the page reference handling and how to get and set the object without using PyPDF2 (and I hope that the /XYZ is the only type linking to a point instead of only a page)

def transformDest(d, rot, ratio, tx, ty):
  if d['/Type'] == '/XYZ':
    if rot == 90:
      l = d.left
      t = d.top
      d['/Left'] = t*ratio+tx
      d['/Top'] = l*ratio+ty
    else: 
      d['/Left'] = d.left*ratio+tx
      d['/Top'] = d.top*ratio+ty
  return d

if you have an idea let me know, else I think I give this up for now and would clean this up a bit such that it at least works with external links.

bordaigorl commented 3 years ago

Ah I meant the destination items in the dicts... the positions can be fixed as we established before...

benneti commented 3 years ago

Yes, I was talking about the destinations if they have the type "/XYZ" they can point to a coordinate and therefore need to be transformed, too. (This is not handled by Annots but very similar).