Open rvanlaak opened 11 years ago
replace
is not possible, at least for now. I don't think it's event supported by PDF readers. I also doubt for add bookmarks
search in PDF bookmarks
also sounds like a rare use case to me.
I'm not sure if innerText
or :contains
is enough for these features: see http://stackoverflow.com/questions/12445020/javascript-window-find-doesnt-work-absolutely
But indeed there is a problem when lazy loading is enabled: pages are not loaded until viewed, so we need to load them before searching for any text.
Possible solution would be either searching text nodes in DOM and highlight them or generate inverted index to use in search (using https://github.com/fagbokforlaget/pdfiijs or pdftotext and feed it into indexing system).
@iapain the library you're proposing sounds great, certainly since I've got both a PDF-file and a pdftotext
-output. Does the snowball-js
support the following use-case?
My use-case is that I've got fragments from the pdftotext
, that I would like to show/mark in the original PDF with its original markup. It would be awesome if I can use pdf2htmlEX
in order to preserve the markup from the PDF.
I've been digging through the changelog / release notes / blogspot posts, and found out it is possible to search the output, and compare the html like diffs.
Can you elaborate a bit more on those features, because I could not find any documentation about that.
The result that
pdf2htmlEX
outputs is great, and is very suitable to replace Acrobat Reader. One of the features that makes Acrobat favorable above the browser output, is the ability to search in the document.Feature request: add an search-API in the library, so it is possible to perform text-searches in the document.
Features of the API could be:
When this API works, a next step could be to implement an GUI that makes use of this API. I will make another issue for that.