Feature Tweak - Results Per Page

ProgGrog commented 5 years ago

Saw this https://stackoverflow.com/questions/36270247/vba-extract-data-from-pdf-and-add-to-worksheet and it's a perfect solution to that problem.

I was trying it with larger documents (where the line number becomes less meaningful) and the usefulness of tweaking your formula to break the text array in to by page, or perhaps into a by line, within by page. I'm looking at the vb options and don't know them well enough. As I read it, within LoanandConvert: textPDF = objPDF.GetText(1) 'grabs all the text from the obf, to be split into an array, on the next line. textArray = Split(textPDF, vbNewLine) 'split is fx, vbNewLine is the delimiter for splitting up the array.

I'm not sure if there's a direct method toward the above use, as the OpenPdf.open copy/paste is not (can not) grabbing the page data. Perhaps it would be possible to 1) inject additional text at the end of each page in an earlier step, then 2) delimit the TextPDF string into a new pageArray first, and then into lines for textArray

avs3c commented 5 years ago

Result per page is indeed an interesting feature. However there is a serious difficulty we need to overcome: The implementation of copy-paste method in this project is really a crude way to achieve the data transfer. It is obvious that the clipboard gets the raw text of the pdf, for which I am quite positive that it does not contain any newpage characters.

Having said the above, we have two options available (at least of what I can think):

Find a way to get the more complicated text format, directly from pdf, meaning that we need to use an API for the pdf program. Adobe Acrobat has an API but other programs do not.
Identify special characters, phrases or patterns that indicate a new page (eg. "Page 1" or "-1-" etc) and manually add the desired space or change Sheet.

The first option adds immense complexity and deviates from the initial release to a great extent. Therefore, I would suggest pursuing the second one.

jbylsma1 commented 5 years ago

Hello. First of all, thanks for this code decade. This is useful for a project I am working on.

I think results-per-page would be useful. I had a thought that might be viable. If the PDF opens in Adobe Acrobat for viewing, we can use send-keys to place Adobe Acrobat into single-page, non-scrolling mode. Then the Ctrl+A would only select that single page. We could then use send keys to page down in the PDF until the end of document. End of document could be detected by making sure the copy string is changing from one iteration to the next, and stopping when the copy string stops changing.

The biggest challenge I see is detecting version of Acrobat. That might require checking the registry.

avs3c commented 5 years ago

Hi @jbylsma1 As far as I know, there is no keyboard shortcut for toggling single-page or non-scrolling mode in Adobe Acrobat. Did you mean something else?

javaftper commented 5 years ago

Hi @jbylsma1 As far as I know, there is no keyboard shortcut for toggling single-page or non-scrolling mode in Adobe Acrobat. Did you mean something else?

SendKeys "%vps" 'single-page view (ALT + v,p,s)

avs3c / pdf2excel-vba

Feature Tweak - Results Per Page #1