PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

Create annotation with opacity < 100% #142

Closed carygravel closed 3 years ago

carygravel commented 3 years ago

I would like to create annotations with opacity < 100% in order to achieve a highlighter effect.

I've seen this is possible in PDF clients, so I assume that either I haven't spotted where it is documented in PDF::Builder, or hasn't been documented, or PDF::Builder is just missing some functionality.

PhilterPaper commented 3 years ago

First of all, are you familiar with examples/040_annotations and examples/041_annot_fileattach? Make sure you understand how those work. It appears that you can specify the icon to be 'None'. What if you just defined an area of text (or something else) with a yellow background and maybe an off-black text (mix in a little yellow), and define an annotation rectangle in the same place?

You're not looking to have a user draw a highlighter effect, are you? Adobe PDF Reader can do something like that. I assume that you want a fixed content that a user can click on to bring up a file or enter commentary. That should be possible by defining the appropriate background and foreground colors, and then doing a standard rectangular annotation with icon=None over it. Perhaps as an alternative you could make a yellow image with low opacity, and then just place it over the regular page content. You might need to define the (or a) graphics context after the text so that it's layered on top (rather than being below the text). Also look at examples/060_transparency.

Do any of those methods do the job for you? I'm not sure there's a way to do transparency in annotation icons (just fill color).

carygravel commented 3 years ago

Other APIs and clients offer opacity values, so I assume that it must be possible without resorting to extra graphics content:

https://api.itextpdf.com/iText7/dotnet/7.1.8/classi_text_1_1_kernel_1_1_pdf_1_1_annot_1_1_pdf_markup_annotation.html

PhilterPaper commented 3 years ago

So are they using PDF's built-in annotation support (which I'm not aware of supporting opacity) or are they doing something custom? Give me an example of how this thing would work, both to someone creating the PDF and someone using it.

carygravel commented 3 years ago

Here are a couple of PDFs with annotations, one with opacity 50% and one 100%.

opacity100.pdf opacity50.pdf

I created them initially with PDF::Builder, and changed the opacity with evince, which is part of the Poppler project.

carygravel commented 3 years ago

If you look at the PDF reference, section 8.4, table 8.10 lists a CA key which defines the opacity:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf

carygravel commented 3 years ago

Table 8.21 also explicitly lists a highlight annotation, but it is unclear to me whether this only relates to text, or whether one can also use it to highlight part of an image.

PhilterPaper commented 3 years ago

I thought I had tried that before, and could never get it to work. Maybe I had it in the wrong place. Anyway, it would be easy enough to add an opacity => 0.5 style option to an annotation, outputting /CA 0.5 to the object. Is that what you need?

When you create the Title (/T), is that with Builder or some other tool? I could never get Title to work, but from your example, it sounds like it's looking for a () list of UTF-16 Unicode points (plus an xFEFF byte order flag?), not ASCII text. Are you at all familiar with this? And how about the /Contents field... I don't see "Christie Cresswell...." anywhere on the page. Does it deliberately not match the Title?

carygravel commented 3 years ago

I thought I had tried that before, and could never get it to work. Maybe I had it in the wrong place. Anyway, it would be easy enough to add an opacity => 0.5 style option to an annotation, outputting /CA 0.5 to the object. Is that what you need?

Sounds perfect, thanks.

When you create the Title (/T), is that with Builder or some other tool? I could never get Title to work, but from your example, it sounds like it's looking for a () list of UTF-16 Unicode points (plus an xFEFF byte order flag?), not ASCII text. Are you at all familiar with this? And how about the /Contents field... I don't see "Christie Cresswell...." anywhere on the page. Does it deliberately not match the Title?

:-) I created it with Builder, then edited the annotation with evince/poppler, trying to remove the name, but evidently it only removed it from the title, and not from the contents.

You understand the PDF internals much better than I do. I'd be really interested to explore how the highlight annotation mentioned in Table 8.21 works.

PhilterPaper commented 3 years ago

I'd be really interested to explore how the highlight annotation mentioned in Table 8.21 works.

I'm not sure what it's supposed to do. It's an annotation type without an icon, but I never implemented anything for it. Some Readers have a highlight with annotation tool, where you can draw a yellow highlighter over some text, and fill in some annotation text. I don't think that's part of the document, though, unless you save such an annotation. Are you looking to predefine such an annotation in your document? Alternatives (to highlight) to mark the annotation are underline, squiggly underline, and strikeout.

carygravel commented 3 years ago

Yup. I'd like to be able to create such highlights in Builder. The problem for the annotations at the moment is that the icons get in the way of the content behind, which is not the case with the highlight function.

PhilterPaper commented 3 years ago

So you're looking for at least one of the four non-icon annotation methods? (If I can do one, I can probably support all four.) Now, how does this relate to your original opacity request? Is that a separate issue? Before I do any work in this area, I want to be sure of just what new functionality you need, so I can get it done once and for all.

carygravel commented 3 years ago

I would like one of the four non-icon annotations methods (additionally to the opacity problem). I suppose it is a separate issue. Thanks for considering it!

PhilterPaper commented 3 years ago

I think I have both requests (icon opacity and four new "markup" annotations) done. Please give them a try. Annotation.pm, 040_annotation, and 041_annot_fileattach were changed.

carygravel commented 3 years ago

Cool! Thanks! I'll take a look and let you know.

carygravel commented 3 years ago

I think your comment

These are four sets of C<x,y> coordinates, given (for Left-to-Right text) as the upper bound Upper Left to Upper Right and then the lower bound Lower Left to Lower Right. B<Note that this is different from what is (erroneously) documented in the PDF specification!>

is interesting! Surely the spec is king and the client software - even if is written by Adobe - is just an implementation. Let's check other implementations as see if they also differ from the spec.

carygravel commented 3 years ago

Looks good to me. Thanks for you work. I'm still checking the coordinates, though, and I think the implementation I have (evince) matches that of the spec. It would surprise me if the spec was "wrong", given that many thousands of pairs of eyes must have read it over the many versions. v1.4 matches v1.7, which is also the ISO.

PhilterPaper commented 3 years ago

Well, I was stunned too. I checked the official 1.4 and 1.7 specs from Adobe, and they very clearly show (and describe) the coordinates list as being counter-clockwise from Lower Left (LL, LR, UR, UL), at least for Left-to-Right text (the description is a bit ambiguous on that point). I first implemented it that way, and was rewarded with a sort-of "bowtie" effect highlighted area. Through trial-and-error, I found a working method of UL, UR, LL, LR; giving the desired clean rectangle (with slightly rounded ends).

It's entirely possible that Adobe Acrobat Reader DC has a bug in it, but that's what works at this time. Highlighting some selected text in the Reader gives the non-standard ordering for the annotation object's QuadPoints list, so at least it's internally consistent. Or, I have something else messed up in my implementation and produces a need for Reader to use the non-standard ordering, in which case I'd appreciate knowing about it.

carygravel commented 3 years ago

Well, I was stunned too. I checked the official 1.4 and 1.7 specs from Adobe, and they very clearly show (and describe) the coordinates list as being counter-clockwise from Lower Left (LL, LR, UR, UL), at least for Left-to-Right text (the description is a bit ambiguous on that point). I first implemented it that way, and was rewarded with a sort-of "bowtie" effect highlighted area. Through trial-and-error, I found a working method of UL, UR, LL, LR; giving the desired clean rectangle (with slightly rounded ends).

I take it all back. After more investigation, I concur with your interpretation. I'm using LL, LR, UL, UR, but if I follow the spec and order the points anticlockwise, my PDF viewer (evince) also shows a bowtie.

Many thanks for your work on this. When do you intend to release it?

carygravel commented 3 years ago

I don't have any tools for extracting the annotations in a PDF. If I open a PDF in Builder, how can I iterate over the annotations in a page?

PhilterPaper commented 3 years ago

How did Adobe let such a whopper get by them? Now I'm wondering about the accuracy of the rest of the PDF specification! Maybe that is why I've never been able to get some things to work as expected.

It's in GitHub now. I wasn't planning to do another CPAN release before at least the end of March, April at latest (I'd like to put all the TIFF changes to bed, first), but if the markup annotation is a make-or-break critical update for many, I could consider an earlier release.

You want to find all the annotations on a page? I haven't tried it, but you might be able to start at a page's $self->{'Annots'} and work your way through it from there. Are you trying to remove existing annotations from a page, modify them, extract their text, or something else? Googling PDF extract annotations, I see https://unix.stackexchange.com/questions/31521/how-to-extract-annotations-from-pdf-files -- that might be a start. Apparently there are tools out there to do such a thing.

carygravel commented 3 years ago

You want to find all the annotations on a page? I haven't tried it, but you might be able to start at a page's $self->{'Annots'} and work your way through it from there. Are you trying to remove existing annotations from a page, modify them, extract their text, or something else? Googling PDF extract annotations, I see https://unix.stackexchange.com/questions/31521/how-to-extract-annotations-from-pdf-files -- that might be a start. Apparently there are tools out there to do such a thing.

Whilst I understand that it is unreasonable to expect Builder to read any PDF, I think is reasonable to expect it to roundtrip any PDF that it orginally created. Which is what I am trying to achieve - to extract the annotations from a PDF Builder previously created in order to display them to the user in a GUI.

Your stack exchange link uses Poppler, which is a Linux-only PDF library, which I could probably get to work for me, but is not (easily) available for Windows.

I've had a quick look at $self->{'Annots'}, but can't see my original annotations there, except in the form of the stringified PDF

It would be great if you could expose some API to iterate through them.

PhilterPaper commented 3 years ago

Since this latest request has wandered quite a ways from the original issue, and is unlikely to be quickly resolved, I've opened a new ticket for it (#147). Please continue discussion on extracting annotations there.

PhilterPaper commented 3 years ago

I just came across https://www.pdfscripting.com/public/PDF-Page-Coordinates.cfm which documents Quad Points as left-to-right, first top and then bottom (as you tried it). I may go ahead and change the POD to go UL-UR-LL-LR just to be consistent.

Same thing with https://stackoverflow.com/questions/9855814/pdf-spec-vs-acrobat-creation-quadpoints -- everyone found that the spec was wrong. There was mention that the PDF spec was being changed to match Reader behavior, but that would have to be in 2.0, which I haven't seen yet.