PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

How to name a destination #202

Open PhilterPaper opened 9 months ago

PhilterPaper commented 9 months ago

(I'm not sure why this never got a ticket in PDF::Builder... maybe I thought it was similar to #119.)

=============== On 2022-08-26 @sciurius wrote in ssimms/pdfapi2/issues/54:

Annotations have a method link.

$annotation->link($destination, $location, @args);

According to the docs, $destination is a PDF::Builder::Page object or the name of a named destination defined elsewhere. I must be overlooking something, but how can I define a named destination?

$destination = PDF::Builder::NamedDestination->new($pdf, ...);

This seems to produce an unnamed (explicit) destination.

=============== On 2023-09-30 @kg4zow replied:

I'm running into the same problem - I can create a "destination", but I can't find any information about how to associate a NAME with it.

In my case, I need the name so that users can link to a specific page within the PDF, referencing it by the name. Pages correspond to dates, so what I'm thinking is something like https://hostname/filename.pdf#2023-09-30 rather than https://hostname/filename.pdf#page=47 (because the file itself will be updated on a semi-regular basis, and the page numbers for each date will change over time).

PhilterPaper commented 9 months ago

I haven't played much with annotation links and named destinations. First of all, is this within a document (single PDF file) or could it be in not only a separate document, but even on a different system?

Annotation's link() does mention that the target could be a Named Destination, but NamedDestination doesn't seem to be of much help. It seems possible to define a target on a specific page (with a viewbox defined), but if the target could move around to a different page, that's not of much use in this case. The Builder's examples/042_links shows how to go to a different page in this document, and column() supports HTML-style <a> notation for this. However, it doesn't solve the problem of a page-number-independent destination. Frankly, it's not even clear to me whether NamedDestination is meant to provide targets for Readers to find something (by name) in this document, or they are referencing something by name in another document that you want to go to. Improving the documentation will probably be the Number 1 job.

If anyone has made progress in this area, I'd like to hear about it. Sample PDF documents which link in such a manner (either within themselves, or to other PDFs) could provide an example for me to figure out an extension to NamedDestination, so if you can point to any, that would be great. The PDF 32000-1:2008 document (PDF 1.7) in 12.3.2.3 discusses Named Destination, and I'll have to dive back into it once I get some time. I see something about hint lists to speed up the Reader's finding the target, and other scary stuff.

kg4zow commented 9 months ago

I'm using PDF::API2 directly for the program I'm working on, but with that said ...

My understanding is, a "NamedDestination" creates an externally visible name which references a given view (i.e. page number, zoomed in to show a specific part of the page) within a file.

The idea is this: somebody can put a PDF on a web site somewhere, and a page could refer to https://hostname/filename.pdf#name. When a browser visits that link, it would download the file and pass the name to the viewer (which could be within the browser, or could be an external program), and the viewer would open the file with that page already selected, and (if the "destination" specified it) zoomed in to show a particular portion of the page.

The specific problem I'm running into is this: I'm making a PDF which will have pages for specific dates (think "day planner"), and I want the users to be able to jump to a specific page by visiting a link like https://hostname/filename.pdf#2023-10-01 ... I know how to create a "NamedDestination" that points to "page 37, view entire page", but I can't figure out how to associate the name 2023-10-01 with that NamedDestination.

PhilterPaper commented 9 months ago

My understanding is, a "NamedDestination" creates an externally visible name which references a given view (i.e. page number, zoomed in to show a specific part of the page) within a file.

OK, that sounds reasonable, that ND defines an externally visible name for a Reader to use, whether it's within the same document or a different document on a different site.

Notice that with HTML targets (uri() method), the link can include an id (e.g., <h1 id="2023-10-01">... referred to by #2023-10-01 added to the URI). I think we're all looking for something similar for PDFs. Come to think of it, has anyone actually tried #2023-10-01 added to a link() PDF link?

I can't figure out how to associate the name 2023-10-01 with that NamedDestination.

At this point I don't know how to do this, either. The PDF documentation is kind of vague, and lacks examples. I'm hoping that someone who knows this stuff can chime in and give some pointers, both for defining ND's in a document, and how to best link to them. Even an example or two in some online PDFs would be useful to reverse engineer.

PhilterPaper commented 9 months ago

Poking around a little more, but not actually trying anything yet in code, I see the following data points:

Maybe this will enable some progress on using Named Destinations. I don't expect to get back to this for a while, so if you find anything interesting, please report it here. I can add fixes to PDF::Builder if it doesn't quite work (at a minimum, improved documentation).

PhilterPaper commented 9 months ago

Johan, to try to answer your year-old question, look in https://www.catskilltech.com/Documentation/PDF/Builder/NamedDestination.html for additional information. I haven't tried this, but it appears as though you would define $destination with new(), as you did, and then you need to invoke dest() with a page and desired fit information.

Frankly, I have no idea about the link/goto, url/uri, etc. methods. I would think those would be covered in the Annotations page, as they're the code to execute a jump to the other document. They may end up getting removed from NamedDestination. Let me know if this works and/or you discover anything interesting.

P.S. Once you have defined a Named Destination, you can first try it out from the command line with various #nameddest= flavors as discussed above. You should try that before spending time trying to use an Annotation link/goto.

PhilterPaper commented 8 months ago

I spent a few hours today trying to understand Named Destinations, and all I can say is that it appears to be a very complicated subject! I was going through a bunch of random PDF files I had (not produced by PDF::API2 or Builder), hoping to find one with some Named Destinations I could try. I did find a few with global entries that looked like they might be ND's, but none of the text strings from /Names lists I tried seemed to work.

Note that the Windows command line treats # as part of the name of the file, and complains that it can't find that command. So, I can't type in > blah.pdf#nameddest=someND... it tells me that it can't find anything like "blah.pdf#nameddest". Maybe you can try it on Linux. It might be recognized in a browser (Firefox) as file:///C:/Users/Phil/Desktop/blah.pdf#nameddest=someND, or at least, it doesn't complain (but doesn't go anywhere except the first page). So, I'm not even sure how to invoke a Named Destination.

What should be an ND text name (in /Names) comes in several flavors, none of which seem to work. I haven't found any (plain-text) ones, which supposedly should be plain ASCII, but do see some octal escapes and random looking characters (which are at least copyable as ASCII). There's also a few <hheexxssttrriinngg> cases, but many of these codes are not ASCII (>7F) and I can't type them on my keyboard. They don't successfully decode as UTF-8, either, so I don't know what they are!

There seem to be several flavors of using ND's. One involves a Name Tree with /Limits giving the range of text keys (sorted) and having interior nodes with /Kids lists and leaf nodes with /Names. These can be nested quite deeply, and I haven't figured out the order and structure yet. The other seems a bit simpler (perhaps only a flat tree). Both start with /Names n 0 R at the root, with n 0 obj << /Dists m 0 R >>. m 0 obj << /Kids [ k 0 R l 0 R ... ] >> points to Limits and more Kids, where /Limits [ <hhhhhhhhhh_min> <hhhhhhhh_max> ] or /Limits [ (text_min) (text_max) ] (but "text" is often not 7 bit ASCII). Eventually you get down to leaf node objects with /Limits and /Names. The objects pointed to are /D [real_object /fit_and_params] /S /GoTo which is more or less the end of the story. There is not only the target object (usually a page, I think) and its fit (display) information. Note that there's not just the target page/fit, but also some sort of action. That may explain why the link/goto, url/uri, etc. commands are there too.

There's also a "Linearized" PDF with a Hint Table to find things much faster, but I don't think I've run into such a thing yet.

If either of you know of a free tool to list Named Destinations in a PDF, so I can type them in, as well as how (if possible) to do it from a shell command line (at least, in Windows), I'd love to hear about it. At the moment, I've shot my bolt and will have to put this aside while I work on some more important things and get 3.026 out the door. If you have any knowledge about ND's please share it! Even a pointer to a good tutorial about what's going on would be great. Everything I've found so far assumes that you are using Adobe Acrobat or some similar tool.

Add: I found and tried the "Poppler" utility (pdfinfo -dests blah.pdf) and it seems to work (both #NDentry and #nameddest=NDentry). None of my samples I tried yesterday had Named Destinations, and looking inside PDF32000_2008.pdf I can't find what I know to be a working ND. It might be encrypted or compressed in some manner.