espresso3389 / pdfrx

pdfrx is yet another PDF viewer implementation that built on the top of PDFium. The plugin currently supports Android, iOS, Windows, macOS, Linux, and Web.
MIT License
96 stars 47 forks source link

Support for PDF article headings interaction #71

Closed reddeath1 closed 6 months ago

reddeath1 commented 6 months ago

I've been using the Page Suite app for reading papers, and besides its good design, one feature has really impressed me: the ability to click on a link and have a preview of the linked content fetched and displayed directly within the app.

I also have to give a big thumbs up for pdfrx—it's really impressive. I especially love the product and I'm looking forward to using it in my own app.

After seeing how the Link Builder works, I was wondering if it could be possible to interact with the headings in a PDF. For example, if page one has two or three articles, those headings could be made clickable. The developer would set a custom URL, which would then be passed along to the Link Builder. When the heading of an article is clicked, it could return the article, which could then be modified into a link. For instance, "Normative references" could become customURL+"normative-references". This would eliminate the need for a developer to use the launchUrl. instead will use webView to display the article from the URL, though this part isn't necessary since it can be easily done by a developer.

Another highly attractive feature for readers would be the ability to interact with the PDF like they would with a physical book. I'm referring to being able to flip through pages and see a layout similar to what's shown in the screenshot. The flipping can be achieved with widget like page_flip https://github.com/shivbo96/page_flip

ezgif-5-7234ba1ecb

Thank you, for having a time to read this.

reddeath1 commented 6 months ago

Thanks for making this request public, @espresso3389 if I may ask, when do you plan to take a look into this. or any Idea that may shorten the wait period.. as I m working on the pdf view widget for my app. and PDFRX is a choice for this. Thank you again.

espresso3389 commented 6 months ago

For page flipping, it seems you can use PdfPageView with the flutter plugin you suggested. Anyway, please use an independent issue for each problem/suggestion or such.

For heading interaction, I could not understand what you are explaining. Could you please explain it again with other wordings?

reddeath1 commented 6 months ago

Thank you @espresso3389

To outline regarding pageflip. I currently use it using pdf_render plugging but the result is not good as both redraw pdf content to the screen hence the quality is not good compared to when if it comes along with plugin it'self since the creation of flipping mechanism happens on page draw.

And for heading what I meant was very simple. Imagine you click on the pdf article title and you able to get its full content on the popup screen rather than pinch zoom to get full visibility of the content. So that technically possible with the mechanism of selection technique that was used on the pdfrx but instead of selection. A click event is been listened and get the exact content on that area (heading/title only) also since a callback is returned then the actual content is return and the developer will use it as he see it fits.. for my case is getting the exact same article on the internet because it's linked to a specific website where url is passed by the developer.. no need to pass it as a parameter, just the returned title/heading/text of that area.

espresso3389 commented 6 months ago

Imagine you click on the pdf article title and you able to get its full content on the popup screen rather than pinch zoom to get full visibility of the content

Sorry, I don't get anything about your idea.

reddeath1 commented 6 months ago

Imagine you click on the pdf article title and you able to get its full content on the popup screen rather than pinch zoom to get full visibility of the content

  • What is "pdf article title"?

  • What do you mean by "its full content"?

Sorry, I don't get anything about your idea.

I m sorry for making this hard. Here the screenshot. And regarding full article, when you click the title there's content below it.. so all the content before another title are returned after a click.. so the solution might be full article from the pdf or fetched from the internet through the title return after a click. So a click is initiated a title return then a developer takes that title and search it over the internet and return the full article or an actual article returned from the pdf. image

espresso3389 commented 6 months ago

PDF page text is easy to extract using PdfPage.loadText. And it provides you with full page text as PdfPageText/fullText.

Because it is simple String, you can do any text based analysis on it. Then, you can obtain corresponding PdfPageTextFragment by PdfPageText.getFragmentIndexForTextIndex and PdfPageText.fragments.

It has PdfPageTextFragment.bounds property which indicates the area within a page you may want to place some widget on.

And, PdfViewerParams.pageOverlaysBuilder can be used to place the widgets on the pages.

I think all the APIs you need are already available.

Further more, if the document has outline, you can easily get the list of headings by PdfDocument.loadOutline.

reddeath1 commented 6 months ago

Thanks you very much, sorry if I may sound unprofessional.. the text that I want to get is when a click event is fired, and only article headings are needed.. if you can share a pointer on this with few code that will help, if I m not required to write full functionality from the plugin.

In few days from now I will shift from pdf_render to pdfrx. Also should I make a new issue on pageFlip so that I can get full view on how I can use pageFlip plugin to achieve the intended effect without losing the quality.

Again thank you very much.

espresso3389 commented 6 months ago

The problem is, how can we tell the article headings from the text lines. Do you have any idea on that?

For example, if I open PDF reference document in pdfrx, the first page contain the following text. Do you logically determine where the headings inside the text?

© Adobe Systems Incorporated 2008 – All rights reserved i
PDF 32000-1:2008
First Edition
2008-7-1
Document management — Portable document format — Part 1: 
PDF 1.7

On the third page, everything seems headings (of course, they're identical string to heading strings):

© Adobe Systems Incorporated 2008 – All rights reserved iii
PDF 32000-1:2008
Contents Page
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1 General. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Conforming readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.3 Conforming writers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.4 Conforming products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Normative references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Version Designations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.1 General. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2 Lexical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.3 Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.4 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.5 File Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.6 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.7 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.8 Content Streams and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.9 Common Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.10 Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.11 File Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.12 Extensions Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.1 General. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.2 Graphics Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.3 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.4 Graphics State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.5 Path Construction and Painting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.6 Colour Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.7 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.8 External Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.9 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.10 Form XObjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.11 Optional Content. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.1 General. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.2 Organization and Use of Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.3 Text State Parameters and Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.4 Text Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.5 Introduction to Font Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.6 Simple Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.7 Composite Fonts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.8 Font Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.9 Embedded Font Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.10 Extraction of Text Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

On page 7, at last, we can get the real heading "Introduction" here. But how do you kwno which one is real heading or not in logical way?

© Adobe Systems Incorporated 2008 – All rights reserved vii
PDF 32000-1:2008
Introduction
ISO 32000 specifies a digital form for representing documents called the Portable Document Format or usually 
referred to as PDF. PDF was developed and specified by Adobe Systems Incorporated beginning in 1993 and 
continuing until 2007 when this ISO standard was prepared. The Adobe Systems version PDF 1.7 is the basis 
for this ISO 32000 edition. The specifications for PDF are backward inclusive, meaning that PDF 1.7 includes 
all of the functionality previously documented in the Adobe PDF Specifications for versions 1.0 through 1.6. It 
should be noted that where Adobe removed certain features of PDF from their standard, they too are not 
contained herein.
The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, 
independent of the environment in which they were created or the environment in which they are viewed or 
printed. At the core of PDF is an advanced imaging model derived from the PostScript® page description 
language. This PDF Imaging Model enables the description of text and graphics in a device-independent and 
resolution-independent manner. To improve performance for interactive viewing, PDF defines a more 
structured format than that used by most PostScript language programs. Unlike Postscript, which is a 
programming language, PDF is based on a structured binary file format that is optimized for high performance 
in interactive viewing. PDF also includes objects, such as annotations and hypertext links, that are not part of 
the page content itself but are useful for interactive viewing and document interchange.
PDF files may be created natively in PDF form, converted from other electronic formats or digitized from paper, 
microform, or other hard copy format. Businesses, governments, libraries, archives and other institutions and 
individuals around the world use PDF to represent considerable bodies of important information. 
Over the past fourteen years, aided by the explosive growth of the Internet, PDF has become widely used for 
the electronic exchange of documents. There are several specific applications of PDF that have evolved where 
limiting the use of some features of PDF and requiring the use of others, enhances the usefulness of PDF. ISO 
32000 is an ISO standard for the full function PDF; the following standards are for more specialized uses. PDF/
X (ISO 15930) is now the industry standard for the intermediate representation of printed material in electronic 
prepress systems for conventional printing applications. PDF/A (ISO 19005) is now the industry standard for 
the archiving of digital documents. PDF/E (ISO 24517) provides a mechanism for representing engineering 
documents and exchange of engineering data. As major corporations, government agencies, and educational 
institutions streamline their operations by replacing paper-based workflow with electronic exchange of 
information, the impact and opportunity for the application of PDF will continue to grow at a rapid pace.
PDF, together with software for creating, viewing, printing and processing PDF files in a variety of ways, fulfils a 
set of requirements for electronic documents including: 
• preservation of document fidelity independent of the device, platform, and software,
• merging of content from diverse sources—Web sites, word processing and spreadsheet programs, 
scanned documents, photos, and graphics—into one self-contained document while maintaining the 
integrity of all original source documents,
• collaborative editing of documents from multiple locations or platforms,
• digital signatures to certify authenticity,
• security and permissions to allow the creator to retain control of the document and associated rights,
• accessibility of content to those with disabilities,
• extraction and reuse of content for use with other file formats and applications, and
• electronic forms to gather data and integrate it with business systems.

I think it's not a PDF issue but just general purpose heading searching problem. You should find a way to do that.

reddeath1 commented 6 months ago

Thank you for your reply. I think, if possible, we can determine the text's heading through the weight of the text. But based on what you've shared, I see they're the same. The other method that I've thought about is a click event, making it possible to listen to everywhere the user clicks and return the text of that area. Regardless of whether it's a heading or not, a developer will have to work hard to differentiate them and build a desired feature on top of it.

espresso3389 commented 6 months ago

I want to close the issue because it's a little off topic from the nature of pdfrx. It's basically not related to pdfrx and pdfrx could not provide any technical assitance to the issue anyway.

Page flipping feature can be discussed ion #84.

Anyway, please don't discuss two or more issues on a single issue.